You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Pavel Tupitsyn <pt...@apache.org> on 2022/02/03 11:23:59 UTC

IEP-83 Thin Client Keepalive (heartbeat)

Igniters,

Please review the proposal to add heartbeat messages to the thin client
protocol (both 2.x and 3.x) and let me know your thoughts:

https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Ivan Daschinsky <iv...@gmail.com>.
Agree with it, it is consistent. Seems that I have suggested the same

чт, 17 февр. 2022 г., 11:05 Pavel Tupitsyn <pt...@apache.org>:

> I've reviewed the code again and it does not seem right to override
> user-defined heartbeat interval with a *bigger* value,
> so now I only set it to 1/3 of idleTimeout when the user-specified value is
> not already less than that.
>
> On Wed, Feb 16, 2022 at 7:19 PM Pavel Tupitsyn <pt...@apache.org>
> wrote:
>
> > Ok, let's keep heartbeatInterval then.
> > I've updated the code to reflect our recent agreement, please review.
> >
> > On Tue, Feb 15, 2022 at 8:28 PM Ivan Daschinsky <iv...@gmail.com>
> > wrote:
> >
> >> I personally prefer heartbeatInterval
> >>
> >> вт, 15 февр. 2022 г., 18:25 Pavel Tupitsyn <pt...@apache.org>:
> >>
> >> > > What about "keepAlive", "keepAliveInterval" then? It looks more
> common
> >> > and matches the IEP title :)
> >> > According to Google, HeartbeatInterval has ~169K results, and
> >> > KeepAliveInterval has ~110K :)
> >> >
> >> > In my experience, both are well understood. I am equally willing to
> use
> >> any
> >> > of them.
> >> > Any other opinions?
> >> >
> >> > On Tue, Feb 15, 2022 at 6:11 PM Maksim Timonin <
> timoninmaxim@apache.org
> >> >
> >> > wrote:
> >> >
> >> > > What about "keepAlive", "keepAliveInterval" then? It looks more
> common
> >> > and
> >> > > matches the IEP title :)
> >> > >
> >> > > On Tue, Feb 15, 2022 at 5:54 PM Pavel Tupitsyn <
> ptupitsyn@apache.org>
> >> > > wrote:
> >> > >
> >> > > > To summarize, we add two properties to the ClientConfiguration:
> >> > > > bool heartbeatsEnabled = true;
> >> > > > long defaultHeartbeatInterval = 60_000; // Default 1 minute, used
> >> > > >
> >> > > > Logic:
> >> > > > if (heartbeatsEnabled) {
> >> > > >   heartbeatInterval = serverIdleTimeout > 0 ? serverIdleTimeout /
> 3
> >> :
> >> > > > defaultHeartbeatInterval;
> >> > > > }
> >> > > >
> >> > > >
> >> > > > Thoughts, objections?
> >> > > >
> >> > > > On Tue, Feb 15, 2022 at 4:32 PM Ivan Daschinsky <
> >> ivandasch@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > Pavel, sorry, i've made mistake. But current behaviour is ok for
> >> me.
> >> > > This
> >> > > > > timeout cannot be change on server side runtime. But we can
> >> simplify
> >> > > > > protocol just use one opcode and message
> >> > > > >
> >> > > > > вт, 15 февр. 2022 г., 14:54 Ivan Daschinsky <
> ivandasch@gmail.com
> >> >:
> >> > > > >
> >> > > > > > > Idle timeout can't change, why send it back with every
> >> heartbeat
> >> > > > > > response?
> >> > > > > > May be I am wrong, but from code I see this behaviour. But if
> I
> >> am
> >> > > > wrong,
> >> > > > > > this is ok behaviour for me.
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > вт, 15 февр. 2022 г. в 14:00, Pavel Tupitsyn <
> >> ptupitsyn@apache.org
> >> > >:
> >> > > > > >
> >> > > > > >> Ivan, I mostly agree with your proposal, except this point:
> >> > > > > >>
> >> > > > > >> > Response to heartbeat request -- is idle timeout
> >> > > > > >> Idle timeout can't change, why send it back with every
> >> heartbeat
> >> > > > > response?
> >> > > > > >>
> >> > > > > >> > possible cases with cluster restart, upgrade
> >> > > > > >> In those cases, a new connection will be established, and
> we'll
> >> > > > retrieve
> >> > > > > >> the new timeout after the handshake.
> >> > > > > >>
> >> > > > > >>
> >> > > > > >> On Tue, Feb 15, 2022 at 12:04 PM Maksim Timonin <
> >> > > > > timoninmaxim@apache.org>
> >> > > > > >> wrote:
> >> > > > > >>
> >> > > > > >> > Hi Ivan,
> >> > > > > >> >
> >> > > > > >> > Cases you described sound reasonable to me. Then the client
> >> > should
> >> > > > > just
> >> > > > > >> set
> >> > > > > >> > up the `keepAlive` flag, and it just works.
> >> > > > > >> >
> >> > > > > >> > So, there are 3 branches:
> >> > > > > >> > 1. Users don't configure keepAlive at all.
> >> > > > > >> > 2. Users configure keepAliveHeartbeatInterval (long, ms).
> >> > > > > >> > 3. Users configure keepAlive (boolean).
> >> > > > > >> >
> >> > > > > >> > AFAIU, Pavel's proposal is about covering the second case
> >> only.
> >> > > But
> >> > > > > >> > actually the 2nd and 3rd aren't conflicted with each
> other.I
> >> > think
> >> > > > for
> >> > > > > >> both
> >> > > > > >> > branches, a cluster should respond with idleTimeout value
> on
> >> > every
> >> > > > > keep
> >> > > > > >> > alive client request. Because there are possible cases with
> >> > > cluster
> >> > > > > >> > restart, upgrade, etc. Clients should check every response
> >> and
> >> > in
> >> > > > case
> >> > > > > >> of
> >> > > > > >> > changed idleTimeout. For 2nd case write a WARN message, and
> >> for
> >> > > 3rd
> >> > > > -
> >> > > > > >> > reconfigure themself in case of changed idleTimeout.
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >> > On Tue, Feb 15, 2022 at 9:51 AM Ivan Daschinsky <
> >> > > > ivandasch@gmail.com>
> >> > > > > >> > wrote:
> >> > > > > >> >
> >> > > > > >> > > Regarding discussion here [1]
> >> > > > > >> > >
> >> > > > > >> > > I suppose that this feature, despite the fact that
> initial
> >> > > > intention
> >> > > > > >> of
> >> > > > > >> > > Pavel was different, can drastically
> >> > > > > >> > > improve the usage pattern of thin clients and give a lot
> of
> >> > > > > >> opportunities
> >> > > > > >> > > if the following is done:
> >> > > > > >> > >
> >> > > > > >> > > 1. GridNioServer has a great feature -- idle timeout.
> If  a
> >> > > server
> >> > > > > did
> >> > > > > >> > not
> >> > > > > >> > > receive any from a client -- it will be kicked off.
> >> > > > > >> > >     But there are some scenarios that make the use of
> this
> >> > > feature
> >> > > > > >> > > impossible:
> >> > > > > >> > > a. Multiple workers waiting for batch tasks and
> relatively
> >> low
> >> > > > > >> requests
> >> > > > > >> > > rate -- this services will be often kicked off and must
> >> > > reconnect.
> >> > > > > >> > > In order to prevent this behaviour, the user must
> >> implement a
> >> > > kind
> >> > > > > of
> >> > > > > >> > > heartbeating by himself.
> >> > > > > >> > > b. Quite often user may want to implement leader-follower
> >> > > pattern
> >> > > > > for
> >> > > > > >> > > services for HA, so followers also will be considered as
> >> idle.
> >> > > > > Kicking
> >> > > > > >> > off
> >> > > > > >> > > these followers
> >> > > > > >> > > is not acceptable, so user  should also implement
> >> heartbeating
> >> > > by
> >> > > > > >> > himself.
> >> > > > > >> > >
> >> > > > > >> > > My proposition is:
> >> > > > > >> > > 1. Add two flags -- enable/disable heartbeats, and very
> >> > optional
> >> > > > > >> > heartbeat
> >> > > > > >> > > timeout. Set enable to true by default, timeout to
> default
> >> > > > heartbeat
> >> > > > > >> > > timeout.
> >> > > > > >> > > 2. If server and client both support this feature, and
> >> > > heartbeats
> >> > > > > are
> >> > > > > >> not
> >> > > > > >> > > explicitly disabled on client side:
> >> > > > > >> > > 3. Response to heartbeat request -- is idle timeout. If
> >> idle
> >> > > > timeout
> >> > > > > >> is
> >> > > > > >> > set
> >> > > > > >> > > on the server side , set heartbeat timeout to one-third
> of
> >> it,
> >> > > > > instead
> >> > > > > >> > set
> >> > > > > >> > > to default or specified value.
> >> > > > > >> > >
> >> > > > > >> > > Pros:
> >> > > > > >> > > 1. Easy to set up -- just flag on client side and just
> set
> >> > > timeout
> >> > > > > on
> >> > > > > >> > > server side.
> >> > > > > >> > > 2. Hard to configure improperly, i.e set heartbeat
> timeout
> >> not
> >> > > > short
> >> > > > > >> > enough
> >> > > > > >> > > in order to prevent kicking out by server.
> >> > > > > >> > > 3. If the user just wants heartbeats without setting idle
> >> > > timeout
> >> > > > --
> >> > > > > >> > > heartbeats are by default on and with reasonable timeout.
> >> > > > > >> > >
> >> > > > > >> > > Cons:
> >> > > > > >> > > 1. If someone will rely on old behavior and just wants to
> >> drop
> >> > > his
> >> > > > > >> > clients
> >> > > > > >> > > on timeout -- this will not work without reconfiguring,
> he
> >> > > should
> >> > > > > >> disable
> >> > > > > >> > > heartbeats.
> >> > > > > >> > > But I cannot even imagine that someone will find this
> >> > behaviour
> >> > > > > >> > desirable.
> >> > > > > >> > > I strongly believe that this behaviour prevents users
> from
> >> > using
> >> > > > > >> > > idleTimeout on server side.
> >> > > > > >> > >
> >> > > > > >> > > [1] --
> >> > > > > >>
> >> https://github.com/apache/ignite/pull/9817#discussion_r805628955
> >> > > > > >> > >
> >> > > > > >> > > пт, 11 февр. 2022 г. в 10:58, Pavel Tupitsyn <
> >> > > > ptupitsyn@apache.org
> >> > > > > >:
> >> > > > > >> > >
> >> > > > > >> > > > I've prepared a PR, please have a look:
> >> > > > > >> > > > https://github.com/apache/ignite/pull/9817
> >> > > > > >> > > >
> >> > > > > >> > > > On Mon, Feb 7, 2022 at 6:37 PM Ivan Daschinsky <
> >> > > > > ivandasch@gmail.com
> >> > > > > >> >
> >> > > > > >> > > > wrote:
> >> > > > > >> > > >
> >> > > > > >> > > > > I see potential in this feature, especially if we use
> >> > > > something
> >> > > > > >> like
> >> > > > > >> > > > > continuous query. Stale clients can consume a lot of
> >> > > resources
> >> > > > > >> and it
> >> > > > > >> > > is
> >> > > > > >> > > > > worth kick these clients out.
> >> > > > > >> > > > >
> >> > > > > >> > > > > пн, 7 февр. 2022 г. в 18:25, Pavel Tupitsyn <
> >> > > > > ptupitsyn@apache.org
> >> > > > > >> >:
> >> > > > > >> > > > >
> >> > > > > >> > > > > > > If we use new approach, we can reduce this
> timeout.
> >> > But
> >> > > > this
> >> > > > > >> can
> >> > > > > >> > > > affect
> >> > > > > >> > > > > > old clients.
> >> > > > > >> > > > > >
> >> > > > > >> > > > > > idleTimeout is disabled by default, we are not
> going
> >> to
> >> > > > change
> >> > > > > >> > this.
> >> > > > > >> > > > > >
> >> > > > > >> > > > > > > Also, let's think about that sending heartbeats
> and
> >> > > > interval
> >> > > > > >> of
> >> > > > > >> > > > sending
> >> > > > > >> > > > > > > heartbeats could be calculated on the server side
> >> > (i.e.
> >> > > > one
> >> > > > > >> third
> >> > > > > >> > > of
> >> > > > > >> > > > > idle
> >> > > > > >> > > > > > > timeout) and sent to the client during handshake.
> >> > > > > >> > > > > > > Also we can introduce something like a
> negotiation
> >> > > > mechanism
> >> > > > > >> as
> >> > > > > >> > in
> >> > > > > >> > > > > > > zookeeper.
> >> > > > > >> > > > > >
> >> > > > > >> > > > > > I tend to agree with Maksim here, let's keep it
> >> simple
> >> > and
> >> > > > > >> > explicit.
> >> > > > > >> > > > > > Log a warning, but don't do anything clever.
> >> > > > > >> > > > > >
> >> > > > > >> > > > > > On Mon, Feb 7, 2022 at 6:15 PM Ivan Daschinsky <
> >> > > > > >> > ivandasch@gmail.com>
> >> > > > > >> > > > > > wrote:
> >> > > > > >> > > > > >
> >> > > > > >> > > > > > > >> idleTimeout already exists, I don't think we
> >> should
> >> > > > > change
> >> > > > > >> the
> >> > > > > >> > > way
> >> > > > > >> > > > > it
> >> > > > > >> > > > > > > works (or did I misunderstand you?)
> >> > > > > >> > > > > > > If we use new approach, we can reduce this
> timeout.
> >> > But
> >> > > > this
> >> > > > > >> can
> >> > > > > >> > > > affect
> >> > > > > >> > > > > > old
> >> > > > > >> > > > > > > clients.
> >> > > > > >> > > > > > >
> >> > > > > >> > > > > > >
> >> > > > > >> > > > > > > Also, let's think about that sending heartbeats
> and
> >> > > > interval
> >> > > > > >> of
> >> > > > > >> > > > sending
> >> > > > > >> > > > > > > heartbeats could be calculated on the server side
> >> > (i.e.
> >> > > > one
> >> > > > > >> third
> >> > > > > >> > > of
> >> > > > > >> > > > > idle
> >> > > > > >> > > > > > > timeout) and sent to the client
> >> > > > > >> > > > > > > during handshake.
> >> > > > > >> > > > > > > Also we can introduce something like a
> negotiation
> >> > > > mechanism
> >> > > > > >> as
> >> > > > > >> > in
> >> > > > > >> > > > > > > zookeeper.
> >> > > > > >> > > > > > >
> >> > > > > >> > > > > > >
> >> > > > > >> > > > > > > пн, 7 февр. 2022 г. в 18:05, Pavel Tupitsyn <
> >> > > > > >> > ptupitsyn@apache.org
> >> > > > > >> > > >:
> >> > > > > >> > > > > > >
> >> > > > > >> > > > > > > > Igor,
> >> > > > > >> > > > > > > >
> >> > > > > >> > > > > > > > > Maybe clients should pass this information on
> >> to
> >> > the
> >> > > > > >> > handshake.
> >> > > > > >> > > > > > > >
> >> > > > > >> > > > > > > > Do you think we should log a mismatched timeout
> >> > > warning
> >> > > > on
> >> > > > > >> the
> >> > > > > >> > > > > server,
> >> > > > > >> > > > > > > not
> >> > > > > >> > > > > > > > on the client?
> >> > > > > >> > > > > > > > Or should we do both?
> >> > > > > >> > > > > > > >
> >> > > > > >> > > > > > > >
> >> > > > > >> > > > > > > > I've updated the proposal with
> >> OP_GET_IDLE_TIMEOUT
> >> > and
> >> > > > > some
> >> > > > > >> > other
> >> > > > > >> > > > > > details
> >> > > > > >> > > > > > > > discussed above.
> >> > > > > >> > > > > > > >
> >> > > > > >> > > > > > > > On Mon, Feb 7, 2022 at 5:42 PM Igor Sapego <
> >> > > > > >> isapego@apache.org
> >> > > > > >> > >
> >> > > > > >> > > > > wrote:
> >> > > > > >> > > > > > > >
> >> > > > > >> > > > > > > > > Feature seems useful for me as it makes
> >> connection
> >> > > > > >> management
> >> > > > > >> > > > more
> >> > > > > >> > > > > > > robust
> >> > > > > >> > > > > > > > > and
> >> > > > > >> > > > > > > > > predictable.
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > I agree with Pavel, that we should print
> >> warning
> >> > > when
> >> > > > > >> > heartbeat
> >> > > > > >> > > > > > period
> >> > > > > >> > > > > > > is
> >> > > > > >> > > > > > > > > larger than
> >> > > > > >> > > > > > > > > idle timeout, but I see a problem here as
> idle
> >> > > timeout
> >> > > > > is
> >> > > > > >> > > > > configured
> >> > > > > >> > > > > > on
> >> > > > > >> > > > > > > > > server and is not
> >> > > > > >> > > > > > > > > known to clients, while heartbeats configured
> >> on
> >> > > > clients
> >> > > > > >> and
> >> > > > > >> > > > their
> >> > > > > >> > > > > > > period
> >> > > > > >> > > > > > > > > is not known
> >> > > > > >> > > > > > > > > to the server. Maybe clients should pass this
> >> > > > > information
> >> > > > > >> on
> >> > > > > >> > to
> >> > > > > >> > > > the
> >> > > > > >> > > > > > > > > handshake.
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > Regarding Python and PHP clients - can not we
> >> use
> >> > > some
> >> > > > > >> kind
> >> > > > > >> > of
> >> > > > > >> > > > > timers
> >> > > > > >> > > > > > > to
> >> > > > > >> > > > > > > > > implement
> >> > > > > >> > > > > > > > > this feature?
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > Best Regards,
> >> > > > > >> > > > > > > > > Igor
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel
> Tupitsyn <
> >> > > > > >> > > > > ptupitsyn@apache.org>
> >> > > > > >> > > > > > > > > wrote:
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > Maksim, agree. Let's not be too clever and
> >> only
> >> > > log
> >> > > > a
> >> > > > > >> > > warning.
> >> > > > > >> > > > > > > > > >
> >> > > > > >> > > > > > > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel
> >> Tupitsyn <
> >> > > > > >> > > > > > ptupitsyn@apache.org>
> >> > > > > >> > > > > > > > > > wrote:
> >> > > > > >> > > > > > > > > >
> >> > > > > >> > > > > > > > > > > Ivan, idleTimeout already exists, I don't
> >> > think
> >> > > we
> >> > > > > >> should
> >> > > > > >> > > > > change
> >> > > > > >> > > > > > > the
> >> > > > > >> > > > > > > > > way
> >> > > > > >> > > > > > > > > > > it works (or did I misunderstand you?)
> >> > > > > >> > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > Of course, enabling heartbeats means that
> >> > > > otherwise
> >> > > > > >> idle
> >> > > > > >> > > > > clients
> >> > > > > >> > > > > > > will
> >> > > > > >> > > > > > > > > no
> >> > > > > >> > > > > > > > > > > longer be disconnected by the server.
> >> > > > > >> > > > > > > > > > > I think we should cross-link those
> >> properties
> >> > in
> >> > > > the
> >> > > > > >> > > > > > documentation
> >> > > > > >> > > > > > > > and
> >> > > > > >> > > > > > > > > > > explain this behavior.
> >> > > > > >> > > > > > > > > > >
> >> > > > > >> > > > > > > > > > > On Mon, Feb 7, 2022 at 4:39 PM Ivan
> >> > Daschinsky <
> >> > > > > >> > > > > > > ivandasch@gmail.com>
> >> > > > > >> > > > > > > > > > > wrote:
> >> > > > > >> > > > > > > > > > >
> >> > > > > >> > > > > > > > > > >> >>3. Already implemented: when
> >> > > > > >> > > > > > > > > ClientConnectorConfiguration#idleTimeout
> >> > > > > >> > > > > > > > > > is
> >> > > > > >> > > > > > > > > > >> not zero, server disconnects idle
> clients
> >> > > > > >> > > > > > > > > > >> >>
> >> > > > > >> > > > > > > > > > >> But I suppose it would be great to have:
> >> > > > > >> > > > > > > > > > >> 1. If client supports keep alive, use
> >> > > idleTimeout
> >> > > > > >> > > > > > > > > > >> 2. If not, do not use it.
> >> > > > > >> > > > > > > > > > >>
> >> > > > > >> > > > > > > > > > >> But I am not sure if it is correct or
> not.
> >> > > > > >> > > > > > > > > > >>
> >> > > > > >> > > > > > > > > > >> пн, 7 февр. 2022 г. в 16:01, Maksim
> >> Timonin <
> >> > > > > >> > > > > > > > timoninmaxim@apache.org
> >> > > > > >> > > > > > > > > >:
> >> > > > > >> > > > > > > > > > >>
> >> > > > > >> > > > > > > > > > >> > I believe explicit is better than
> >> implicit
> >> > :)
> >> > > > > Also
> >> > > > > >> in
> >> > > > > >> > > case
> >> > > > > >> > > > > of
> >> > > > > >> > > > > > > > > dynamic
> >> > > > > >> > > > > > > > > > >> > calculation of timeout, it can change
> >> > > > > dynamically,
> >> > > > > >> for
> >> > > > > >> > > > > example
> >> > > > > >> > > > > > > > > > >> restarting a
> >> > > > > >> > > > > > > > > > >> > cluster with different configuration
> >> should
> >> > > > > >> > reconfigure
> >> > > > > >> > > > > > clients
> >> > > > > >> > > > > > > > too.
> >> > > > > >> > > > > > > > > > >> Looks
> >> > > > > >> > > > > > > > > > >> > complicated.
> >> > > > > >> > > > > > > > > > >> >
> >> > > > > >> > > > > > > > > > >> > My vote for WARN + javadocs with
> >> mention of
> >> > > > this
> >> > > > > >> > issue.
> >> > > > > >> > > > > > > > > > >> >
> >> > > > > >> > > > > > > > > > >> > On Mon, Feb 7, 2022 at 3:51 PM Pavel
> >> > > Tupitsyn <
> >> > > > > >> > > > > > > > ptupitsyn@apache.org
> >> > > > > >> > > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > wrote:
> >> > > > > >> > > > > > > > > > >> >
> >> > > > > >> > > > > > > > > > >> > > > WDYT, should we add a WARN message
> >> for
> >> > > > > clients
> >> > > > > >> > that
> >> > > > > >> > > > > > > configure
> >> > > > > >> > > > > > > > > > >> > > > keepAliveTimeout greater than
> >> > idleTimeout
> >> > > > on
> >> > > > > >> the
> >> > > > > >> > > > server
> >> > > > > >> > > > > > > side?
> >> > > > > >> > > > > > > > > > >> > >
> >> > > > > >> > > > > > > > > > >> > > I think we should either log a WARN,
> >> or
> >> > > > > retrieve
> >> > > > > >> > > > > idleTimeout
> >> > > > > >> > > > > > > > from
> >> > > > > >> > > > > > > > > > >> server
> >> > > > > >> > > > > > > > > > >> > > and configure heartbeatTimeout
> >> > accordingly
> >> > > > > (e.g.
> >> > > > > >> > > divide
> >> > > > > >> > > > by
> >> > > > > >> > > > > > 2).
> >> > > > > >> > > > > > > > > > >> > > Thoughts?
> >> > > > > >> > > > > > > > > > >> > >
> >> > > > > >> > > > > > > > > > >> > > On Mon, Feb 7, 2022 at 3:14 PM
> Maksim
> >> > > > Timonin <
> >> > > > > >> > > > > > > > > > >> timoninmaxim@apache.org>
> >> > > > > >> > > > > > > > > > >> > > wrote:
> >> > > > > >> > > > > > > > > > >> > >
> >> > > > > >> > > > > > > > > > >> > > > Hi Pavel,
> >> > > > > >> > > > > > > > > > >> > > >
> >> > > > > >> > > > > > > > > > >> > > > Thanks for the links. Yes, I
> forgot
> >> > that
> >> > > > the
> >> > > > > >> flag
> >> > > > > >> > of
> >> > > > > >> > > > > > changed
> >> > > > > >> > > > > > > > > > >> topology
> >> > > > > >> > > > > > > > > > >> > is
> >> > > > > >> > > > > > > > > > >> > > > lazy. Also I missed that the
> >> keepAlive
> >> > > > > setting
> >> > > > > >> is
> >> > > > > >> > > > > > configured
> >> > > > > >> > > > > > > > on
> >> > > > > >> > > > > > > > > > the
> >> > > > > >> > > > > > > > > > >> > > client
> >> > > > > >> > > > > > > > > > >> > > > side (alternatively to idleTimeout
> >> that
> >> > > is
> >> > > > on
> >> > > > > >> the
> >> > > > > >> > > > server
> >> > > > > >> > > > > > > > side).
> >> > > > > >> > > > > > > > > > >> > > >
> >> > > > > >> > > > > > > > > > >> > > > Now I understand, this feature can
> >> be
> >> > > > helpful
> >> > > > > >> > then.
> >> > > > > >> > > > > Every
> >> > > > > >> > > > > > > > client
> >> > > > > >> > > > > > > > > > can
> >> > > > > >> > > > > > > > > > >> > > > configure itself in case it's
> >> possible
> >> > to
> >> > > > be
> >> > > > > >> idle
> >> > > > > >> > > > > > sometimes,
> >> > > > > >> > > > > > > > and
> >> > > > > >> > > > > > > > > > >> choose
> >> > > > > >> > > > > > > > > > >> > > > an appropriate timeout by itself
> >> too.
> >> > And
> >> > > > by
> >> > > > > >> > default
> >> > > > > >> > > > the
> >> > > > > >> > > > > > > > feature
> >> > > > > >> > > > > > > > > > >> should
> >> > > > > >> > > > > > > > > > >> > > be
> >> > > > > >> > > > > > > > > > >> > > > disabled.
> >> > > > > >> > > > > > > > > > >> > > >
> >> > > > > >> > > > > > > > > > >> > > > WDYT, should we add a WARN message
> >> for
> >> > > > > clients
> >> > > > > >> > that
> >> > > > > >> > > > > > > configure
> >> > > > > >> > > > > > > > > > >> > > > keepAliveTimeout greater than
> >> > idleTimeout
> >> > > > on
> >> > > > > >> the
> >> > > > > >> > > > server
> >> > > > > >> > > > > > > side?
> >> > > > > >> > > > > > > > > > >> > > >
> >> > > > > >> > > > > > > > > > >> > > >
> >> > > > > >> > > > > > > > > > >> > > >
> >> > > > > >> > > > > > > > > > >> > > > On Mon, Feb 7, 2022 at 1:05 PM
> Pavel
> >> > > > > Tupitsyn <
> >> > > > > >> > > > > > > > > > ptupitsyn@apache.org
> >> > > > > >> > > > > > > > > > >> >
> >> > > > > >> > > > > > > > > > >> > > > wrote:
> >> > > > > >> > > > > > > > > > >> > > >
> >> > > > > >> > > > > > > > > > >> > > > > Ivan,
> >> > > > > >> > > > > > > > > > >> > > > >
> >> > > > > >> > > > > > > > > > >> > > > > I suggest the following:
> >> > > > > >> > > > > > > > > > >> > > > >
> >> > > > > >> > > > > > > > > > >> > > > > 1. Server sends KEEP_ALIVE
> feature
> >> > > flag,
> >> > > > > >> which
> >> > > > > >> > > means
> >> > > > > >> > > > > it
> >> > > > > >> > > > > > > > > accepts
> >> > > > > >> > > > > > > > > > >> > > > > OP_KEEP_ALIVE empty message
> >> > > > > >> > > > > > > > > > >> > > > > 2. Client sends OP_KEEP_ALIVE
> when
> >> > the
> >> > > > > >> > connection
> >> > > > > >> > > is
> >> > > > > >> > > > > > idle
> >> > > > > >> > > > > > > > for
> >> > > > > >> > > > > > > > > a
> >> > > > > >> > > > > > > > > > >> > > > > certain period of time
> >> > > > > >> > > > > > > > > > >> > > > > 3. Already implemented: when
> >> > > > > >> > > > > > > > > > >> ClientConnectorConfiguration#idleTimeout
> >> > > > > >> > > > > > > > > > >> > > is
> >> > > > > >> > > > > > > > > > >> > > > > not zero, server disconnects
> idle
> >> > > clients
> >> > > > > >> > > > > > > > > > >> > > > >
> >> > > > > >> > > > > > > > > > >> > > > > This way we don't need
> >> server->client
> >> > > > > >> > keepalives,
> >> > > > > >> > > as
> >> > > > > >> > > > > you
> >> > > > > >> > > > > > > > > > correctly
> >> > > > > >> > > > > > > > > > >> > > noted.
> >> > > > > >> > > > > > > > > > >> > > > >
> >> > > > > >> > > > > > > > > > >> > > > > On Mon, Feb 7, 2022 at 12:43 PM
> >> Ivan
> >> > > > > >> Daschinsky
> >> > > > > >> > <
> >> > > > > >> > > > > > > > > > >> ivandasch@gmail.com
> >> > > > > >> > > > > > > > > > >> > >
> >> > > > > >> > > > > > > > > > >> > > > > wrote:
> >> > > > > >> > > > > > > > > > >> > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > Pavel, I suppose that ideally:
> >> > > > > >> > > > > > > > > > >> > > > > > 1. Client send in handshake
> >> flag,
> >> > > that
> >> > > > it
> >> > > > > >> > > supports
> >> > > > > >> > > > > > > > > KEEP_ALIVE
> >> > > > > >> > > > > > > > > > >> > feature
> >> > > > > >> > > > > > > > > > >> > > > and
> >> > > > > >> > > > > > > > > > >> > > > > > server takes it into account.
> >> > > > > >> > > > > > > > > > >> > > > > > 2. Each request of client can
> be
> >> > > > > >> considered as
> >> > > > > >> > > > > > > keep-alive
> >> > > > > >> > > > > > > > > > ping.
> >> > > > > >> > > > > > > > > > >> > > > > > 3. Client send failure should
> be
> >> > > > > processed
> >> > > > > >> > using
> >> > > > > >> > > > > retry
> >> > > > > >> > > > > > > > > policy.
> >> > > > > >> > > > > > > > > > >> > > > > > 4. Server should not send
> >> > keep-alive
> >> > > > > >> packets,
> >> > > > > >> > it
> >> > > > > >> > > > is
> >> > > > > >> > > > > > > > > redundant,
> >> > > > > >> > > > > > > > > > >> but
> >> > > > > >> > > > > > > > > > >> > > > server
> >> > > > > >> > > > > > > > > > >> > > > > > should track requests from
> >> client
> >> > and
> >> > > > if
> >> > > > > >> there
> >> > > > > >> > > is
> >> > > > > >> > > > no
> >> > > > > >> > > > > > > > > requests
> >> > > > > >> > > > > > > > > > >> from
> >> > > > > >> > > > > > > > > > >> > > > client
> >> > > > > >> > > > > > > > > > >> > > > > > with KEEP_ALIVE feature,
> >> > > > > >> > > > > > > > > > >> > > > > > automatically close connection
> >> and
> >> > > free
> >> > > > > >> > > resources.
> >> > > > > >> > > > > > > > > > >> > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > Similar approach is used in
> >> > zookeeper
> >> > > > > >> clients.
> >> > > > > >> > > > > > > > > > >> > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > пн, 7 февр. 2022 г. в 12:24,
> >> Pavel
> >> > > > > >> Tupitsyn <
> >> > > > > >> > > > > > > > > > >> ptupitsyn@apache.org
> >> > > > > >> > > > > > > > > > >> > >:
> >> > > > > >> > > > > > > > > > >> > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > Ivan,
> >> > > > > >> > > > > > > > > > >> > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > Ideally, the check should
> come
> >> > from
> >> > > > > both
> >> > > > > >> > > sides.
> >> > > > > >> > > > > > > > > > >> > > > > > > - Client periodically sends
> >> > > keepalive
> >> > > > > to
> >> > > > > >> > > server
> >> > > > > >> > > > > > > > > > >> > > > > > > - Server periodically sends
> >> > > keepalive
> >> > > > > to
> >> > > > > >> > > client
> >> > > > > >> > > > > > > > > > >> > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > Feature flags will be added
> >> > > > > accordingly,
> >> > > > > >> so
> >> > > > > >> > it
> >> > > > > >> > > > is
> >> > > > > >> > > > > > not
> >> > > > > >> > > > > > > > > > >> necessary
> >> > > > > >> > > > > > > > > > >> > to
> >> > > > > >> > > > > > > > > > >> > > > > > > implement this in all thin
> >> > clients.
> >> > > > > >> > > > > > > > > > >> > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > On Mon, Feb 7, 2022 at 11:43
> >> AM
> >> > > Ivan
> >> > > > > >> > > Daschinsky
> >> > > > > >> > > > <
> >> > > > > >> > > > > > > > > > >> > > ivandasch@gmail.com
> >> > > > > >> > > > > > > > > > >> > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > wrote:
> >> > > > > >> > > > > > > > > > >> > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > I suppose it is great
> idea,
> >> but
> >> > > > this
> >> > > > > >> > > > > functionality
> >> > > > > >> > > > > > > can
> >> > > > > >> > > > > > > > > be
> >> > > > > >> > > > > > > > > > >> hard
> >> > > > > >> > > > > > > > > > >> > to
> >> > > > > >> > > > > > > > > > >> > > > > > > implement
> >> > > > > >> > > > > > > > > > >> > > > > > > > for some platforms. I.e.
> >> sync
> >> > > > python
> >> > > > > >> > client
> >> > > > > >> > > or
> >> > > > > >> > > > > php
> >> > > > > >> > > > > > > > > (there
> >> > > > > >> > > > > > > > > > >> is no
> >> > > > > >> > > > > > > > > > >> > > > real
> >> > > > > >> > > > > > > > > > >> > > > > > > > multithreading for python
> >> (GIL)
> >> > > and
> >> > > > > >> php is
> >> > > > > >> > > > > single
> >> > > > > >> > > > > > > > > threaded
> >> > > > > >> > > > > > > > > > >> by
> >> > > > > >> > > > > > > > > > >> > > > > design).
> >> > > > > >> > > > > > > > > > >> > > > > > > But
> >> > > > > >> > > > > > > > > > >> > > > > > > > for async clients it is
> not
> >> > very
> >> > > > hard
> >> > > > > >> to
> >> > > > > >> > > > > > implement.
> >> > > > > >> > > > > > > > > > >> > Nevertheless,
> >> > > > > >> > > > > > > > > > >> > > > > this
> >> > > > > >> > > > > > > > > > >> > > > > > > > feature should be
> optional,
> >> > > because
> >> > > > > of
> >> > > > > >> > > > possible
> >> > > > > >> > > > > > > > > technical
> >> > > > > >> > > > > > > > > > >> > > > > limitations.
> >> > > > > >> > > > > > > > > > >> > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > Pavel, is this check
> mostly
> >> for
> >> > > > > client
> >> > > > > >> > side?
> >> > > > > >> > > > Or
> >> > > > > >> > > > > > > > servers
> >> > > > > >> > > > > > > > > > can
> >> > > > > >> > > > > > > > > > >> do
> >> > > > > >> > > > > > > > > > >> > > some
> >> > > > > >> > > > > > > > > > >> > > > > > > actions
> >> > > > > >> > > > > > > > > > >> > > > > > > > if there is no activity
> from
> >> > thin
> >> > > > > >> client
> >> > > > > >> > > (i.e.
> >> > > > > >> > > > > > > closing
> >> > > > > >> > > > > > > > > > >> context
> >> > > > > >> > > > > > > > > > >> > > and
> >> > > > > >> > > > > > > > > > >> > > > > free
> >> > > > > >> > > > > > > > > > >> > > > > > > > resources such as queries'
> >> > > handles
> >> > > > > and
> >> > > > > >> so
> >> > > > > >> > > on?)
> >> > > > > >> > > > > > > > > > >> > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > пн, 7 февр. 2022 г. в
> 11:09,
> >> > > Pavel
> >> > > > > >> > Tupitsyn
> >> > > > > >> > > <
> >> > > > > >> > > > > > > > > > >> > > ptupitsyn@apache.org
> >> > > > > >> > > > > > > > > > >> > > > >:
> >> > > > > >> > > > > > > > > > >> > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > Hi Maksim,
> >> > > > > >> > > > > > > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > half-state is a
> possible
> >> > > > > situation
> >> > > > > >> > when
> >> > > > > >> > > an
> >> > > > > >> > > > > > > Ignite
> >> > > > > >> > > > > > > > > node
> >> > > > > >> > > > > > > > > > >> goes
> >> > > > > >> > > > > > > > > > >> > > > down
> >> > > > > >> > > > > > > > > > >> > > > > or
> >> > > > > >> > > > > > > > > > >> > > > > > > > > somehow removes
> connection
> >> > to a
> >> > > > > thin
> >> > > > > >> > > client
> >> > > > > >> > > > > > > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > Half-open state is also
> >> > > possible
> >> > > > > >> when,
> >> > > > > >> > for
> >> > > > > >> > > > > > > example,
> >> > > > > >> > > > > > > > an
> >> > > > > >> > > > > > > > > > >> > > > intermediate
> >> > > > > >> > > > > > > > > > >> > > > > > > > router
> >> > > > > >> > > > > > > > > > >> > > > > > > > > is rebooted [1].
> >> > > > > >> > > > > > > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > This is what we seem to
> >> have
> >> > > > > >> encountered
> >> > > > > >> > > > with
> >> > > > > >> > > > > > one
> >> > > > > >> > > > > > > of
> >> > > > > >> > > > > > > > > our
> >> > > > > >> > > > > > > > > > >> > > > customers
> >> > > > > >> > > > > > > > > > >> > > > > -
> >> > > > > >> > > > > > > > > > >> > > > > > > they
> >> > > > > >> > > > > > > > > > >> > > > > > > > > have a stable cluster,
> and
> >> > > > > >> long-living
> >> > > > > >> > > > > (multiple
> >> > > > > >> > > > > > > > days)
> >> > > > > >> > > > > > > > > > >> thin
> >> > > > > >> > > > > > > > > > >> > > > client
> >> > > > > >> > > > > > > > > > >> > > > > > > > > connections which can be
> >> idle
> >> > > for
> >> > > > > >> some
> >> > > > > >> > > time.
> >> > > > > >> > > > > > > > > > >> > > > > > > > > And only when we send
> some
> >> > data
> >> > > > on
> >> > > > > >> such
> >> > > > > >> > an
> >> > > > > >> > > > > idle
> >> > > > > >> > > > > > > > > > >> connection do
> >> > > > > >> > > > > > > > > > >> > > we
> >> > > > > >> > > > > > > > > > >> > > > > > > discover
> >> > > > > >> > > > > > > > > > >> > > > > > > > > that it is broken.
> >> > > > > >> > > > > > > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > But with enabled (true
> >> by
> >> > > > > default)
> >> > > > > >> > > > > > > > > partitionAwareness
> >> > > > > >> > > > > > > > > > >> > feature
> >> > > > > >> > > > > > > > > > >> > > > > > clients
> >> > > > > >> > > > > > > > > > >> > > > > > > > can
> >> > > > > >> > > > > > > > > > >> > > > > > > > > be notified about
> topology
> >> > > > changes
> >> > > > > >> > > > > > > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > Partition awareness is a
> >> > "lazy"
> >> > > > > >> > > notification
> >> > > > > >> > > > > in
> >> > > > > >> > > > > > a
> >> > > > > >> > > > > > > > form
> >> > > > > >> > > > > > > > > > of
> >> > > > > >> > > > > > > > > > >> a
> >> > > > > >> > > > > > > > > > >> > > > > response
> >> > > > > >> > > > > > > > > > >> > > > > > > > > message flag [2].
> >> > > > > >> > > > > > > > > > >> > > > > > > > > You won't get one on an
> >> idle
> >> > > > > >> connection.
> >> > > > > >> > > > > > > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > the connections are
> >> removed
> >> > > on
> >> > > > > the
> >> > > > > >> > > server
> >> > > > > >> > > > > side
> >> > > > > >> > > > > > > by
> >> > > > > >> > > > > > > > > > client
> >> > > > > >> > > > > > > > > > >> > idle
> >> > > > > >> > > > > > > > > > >> > > > > > timeout
> >> > > > > >> > > > > > > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > Idle timeout is disabled
> >> by
> >> > > > > default.
> >> > > > > >> > > > > > > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > is it OK to keep such
> >> > > > connections
> >> > > > > >> > alive
> >> > > > > >> > > > for
> >> > > > > >> > > > > a
> >> > > > > >> > > > > > > long
> >> > > > > >> > > > > > > > > > time
> >> > > > > >> > > > > > > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > I think it is up to the
> >> user.
> >> > > > > >> > > > > > > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > in the case of
> partition
> >> > > > > awareness
> >> > > > > >> > > > features
> >> > > > > >> > > > > it
> >> > > > > >> > > > > > > can
> >> > > > > >> > > > > > > > > > lead
> >> > > > > >> > > > > > > > > > >> to
> >> > > > > >> > > > > > > > > > >> > > > > wasting
> >> > > > > >> > > > > > > > > > >> > > > > > > TCP
> >> > > > > >> > > > > > > > > > >> > > > > > > > > sockets on Ignite nodes,
> >> > can't
> >> > > it
> >> > > > > >> > > > > > > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > Can you please
> elaborate?
> >> > > > > >> > > > > > > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > [1]
> >> > > > > >> > > > > > > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > >
> >> > > > > >> > > > > > > > > > >> > > > >
> >> > > > > >> > > > > > > > > > >> > > >
> >> > > > > >> > > > > > > > > > >> > >
> >> > > > > >> > > > > > > > > > >> >
> >> > > > > >> > > > > > > > > > >>
> >> > > > > >> > > > > > > > > >
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > >
> >> > > > > >> > > > > > >
> >> > > > > >> > > > > >
> >> > > > > >> > > > >
> >> > > > > >> > > >
> >> > > > > >> > >
> >> > > > > >> >
> >> > > > > >>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
> >> > > > > >> > > > > > > > > > >> > > > > > > > > [2]
> >> > > > > >> > > > > > > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > >
> >> > > > > >> > > > > > > > > > >> > > > >
> >> > > > > >> > > > > > > > > > >> > > >
> >> > > > > >> > > > > > > > > > >> > >
> >> > > > > >> > > > > > > > > > >> >
> >> > > > > >> > > > > > > > > > >>
> >> > > > > >> > > > > > > > > >
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > >
> >> > > > > >> > > > > > >
> >> > > > > >> > > > > >
> >> > > > > >> > > > >
> >> > > > > >> > > >
> >> > > > > >> > >
> >> > > > > >> >
> >> > > > > >>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
> >> > > > > >> > > > > > > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > On Fri, Feb 4, 2022 at
> >> 4:01
> >> > PM
> >> > > > > Maksim
> >> > > > > >> > > > Timonin
> >> > > > > >> > > > > <
> >> > > > > >> > > > > > > > > > >> > > > > > timoninmaxim@apache.org
> >> > > > > >> > > > > > > > > > >> > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > wrote:
> >> > > > > >> > > > > > > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > Hi Pavel,
> >> > > > > >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > Thanks for starting
> this
> >> > > > thread!
> >> > > > > >> Can I
> >> > > > > >> > > ask
> >> > > > > >> > > > > > some
> >> > > > > >> > > > > > > > > > >> questions
> >> > > > > >> > > > > > > > > > >> > > here
> >> > > > > >> > > > > > > > > > >> > > > to
> >> > > > > >> > > > > > > > > > >> > > > > > get
> >> > > > > >> > > > > > > > > > >> > > > > > > > the
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > feature more clearly?
> >> > > > > >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > As I understand it
> >> > correctly,
> >> > > > > >> > half-state
> >> > > > > >> > > > is
> >> > > > > >> > > > > a
> >> > > > > >> > > > > > > > > possible
> >> > > > > >> > > > > > > > > > >> > > > situation
> >> > > > > >> > > > > > > > > > >> > > > > > when
> >> > > > > >> > > > > > > > > > >> > > > > > > > an
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > Ignite node goes down
> or
> >> > > > somehow
> >> > > > > >> > removes
> >> > > > > >> > > > > > > > connection
> >> > > > > >> > > > > > > > > > to a
> >> > > > > >> > > > > > > > > > >> > thin
> >> > > > > >> > > > > > > > > > >> > > > > > client.
> >> > > > > >> > > > > > > > > > >> > > > > > > > But
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > with enabled (true by
> >> > > default)
> >> > > > > >> > > > > > > partitionAwareness
> >> > > > > >> > > > > > > > > > >> feature
> >> > > > > >> > > > > > > > > > >> > > > clients
> >> > > > > >> > > > > > > > > > >> > > > > > can
> >> > > > > >> > > > > > > > > > >> > > > > > > > be
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > notified about
> topology
> >> > > > changes.
> >> > > > > >> So,
> >> > > > > >> > > there
> >> > > > > >> > > > > are
> >> > > > > >> > > > > > > > > > possible
> >> > > > > >> > > > > > > > > > >> > > cases:
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > 1. ThinClient connects
> >> to a
> >> > > > > single
> >> > > > > >> > node.
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > 2. Ignite node removes
> >> > > > connection
> >> > > > > >> from
> >> > > > > >> > > > > itself.
> >> > > > > >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > I like the idea for
> the
> >> > case
> >> > > > > with a
> >> > > > > >> > > single
> >> > > > > >> > > > > > node,
> >> > > > > >> > > > > > > > as
> >> > > > > >> > > > > > > > > it
> >> > > > > >> > > > > > > > > > >> > helps
> >> > > > > >> > > > > > > > > > >> > > > fail
> >> > > > > >> > > > > > > > > > >> > > > > > > fast.
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > But is it OK to
> connect
> >> a
> >> > > > client
> >> > > > > >> to a
> >> > > > > >> > > > single
> >> > > > > >> > > > > > > node
> >> > > > > >> > > > > > > > > > only?
> >> > > > > >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > For the second one:
> you
> >> > > mention
> >> > > > > >> that a
> >> > > > > >> > > > case
> >> > > > > >> > > > > > for
> >> > > > > >> > > > > > > > the
> >> > > > > >> > > > > > > > > > >> second
> >> > > > > >> > > > > > > > > > >> > > > option
> >> > > > > >> > > > > > > > > > >> > > > > > is
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > "Long-living and
> mostly
> >> > idle
> >> > > > > >> > connections
> >> > > > > >> > > > are
> >> > > > > >> > > > > > > > > > especially
> >> > > > > >> > > > > > > > > > >> > > > > susceptible
> >> > > > > >> > > > > > > > > > >> > > > > > > to
> >> > > > > >> > > > > > > > > > >> > > > > > > > > this
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > behavior". If I
> >> understand
> >> > > > > >> correctly
> >> > > > > >> > the
> >> > > > > >> > > > > > > > connections
> >> > > > > >> > > > > > > > > > are
> >> > > > > >> > > > > > > > > > >> > > > removed
> >> > > > > >> > > > > > > > > > >> > > > > on
> >> > > > > >> > > > > > > > > > >> > > > > > > the
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > server side by client
> >> idle
> >> > > > > timeout.
> >> > > > > >> > Can
> >> > > > > >> > > we
> >> > > > > >> > > > > > just
> >> > > > > >> > > > > > > > > > >> configure
> >> > > > > >> > > > > > > > > > >> > the
> >> > > > > >> > > > > > > > > > >> > > > > idle
> >> > > > > >> > > > > > > > > > >> > > > > > > > > timeout
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > for cases where we
> >> really
> >> > > need
> >> > > > > >> keeping
> >> > > > > >> > > > alive
> >> > > > > >> > > > > > > idle
> >> > > > > >> > > > > > > > > > >> > > connections?
> >> > > > > >> > > > > > > > > > >> > > > > Are
> >> > > > > >> > > > > > > > > > >> > > > > > > > there
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > any other cases with
> >> > > > unexpectedly
> >> > > > > >> > > dropped
> >> > > > > >> > > > > > > > > connections?
> >> > > > > >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > I'm wondering is it OK
> >> to
> >> > > keep
> >> > > > > such
> >> > > > > >> > > > > > connections
> >> > > > > >> > > > > > > > > alive
> >> > > > > >> > > > > > > > > > >> for a
> >> > > > > >> > > > > > > > > > >> > > > long
> >> > > > > >> > > > > > > > > > >> > > > > > > time?
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > Also in the case of
> >> > partition
> >> > > > > >> > awareness
> >> > > > > >> > > > > > features
> >> > > > > >> > > > > > > > it
> >> > > > > >> > > > > > > > > > can
> >> > > > > >> > > > > > > > > > >> > lead
> >> > > > > >> > > > > > > > > > >> > > to
> >> > > > > >> > > > > > > > > > >> > > > > > > wasting
> >> > > > > >> > > > > > > > > > >> > > > > > > > > TCP
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > sockets on Ignite
> nodes,
> >> > > can't
> >> > > > > it?
> >> > > > > >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > Thanks!
> >> > > > > >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > On Thu, Feb 3, 2022 at
> >> 2:24
> >> > > PM
> >> > > > > >> Pavel
> >> > > > > >> > > > > Tupitsyn
> >> > > > > >> > > > > > <
> >> > > > > >> > > > > > > > > > >> > > > > > ptupitsyn@apache.org>
> >> > > > > >> > > > > > > > > > >> > > > > > > > > > wrote:
> >> > > > > >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > > >> Igniters,
> >> > > > > >> > > > > > > > > > >> > > > > > > > > >>
> >> > > > > >> > > > > > > > > > >> > > > > > > > > >> Please review the
> >> proposal
> >> > > to
> >> > > > > add
> >> > > > > >> > > > heartbeat
> >> > > > > >> > > > > > > > > messages
> >> > > > > >> > > > > > > > > > to
> >> > > > > >> > > > > > > > > > >> > the
> >> > > > > >> > > > > > > > > > >> > > > thin
> >> > > > > >> > > > > > > > > > >> > > > > > > > client
> >> > > > > >> > > > > > > > > > >> > > > > > > > > >> protocol (both 2.x
> and
> >> > 3.x)
> >> > > > and
> >> > > > > >> let
> >> > > > > >> > me
> >> > > > > >> > > > know
> >> > > > > >> > > > > > > your
> >> > > > > >> > > > > > > > > > >> thoughts:
> >> > > > > >> > > > > > > > > > >> > > > > > > > > >>
> >> > > > > >> > > > > > > > > > >> > > > > > > > > >>
> >> > > > > >> > > > > > > > > > >> > > > > > > > > >>
> >> > > > > >> > > > > > > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > >
> >> > > > > >> > > > > > > > > > >> > > > >
> >> > > > > >> > > > > > > > > > >> > > >
> >> > > > > >> > > > > > > > > > >> > >
> >> > > > > >> > > > > > > > > > >> >
> >> > > > > >> > > > > > > > > > >>
> >> > > > > >> > > > > > > > > >
> >> > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > >
> >> > > > > >> > > > > > >
> >> > > > > >> > > > > >
> >> > > > > >> > > > >
> >> > > > > >> > > >
> >> > > > > >> > >
> >> > > > > >> >
> >> > > > > >>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
> >> > > > > >> > > > > > > > > > >> > > > > > > > > >>
> >> > > > > >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > > > --
> >> > > > > >> > > > > > > > > > >> > > > > > > > Sincerely yours, Ivan
> >> > Daschinskiy
> >> > > > > >> > > > > > > > > > >> > > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > >
> >> > > > > >> > > > > > > > > > >> > > > > > --
> >> > > > > >> > > > > > > > > > >> > > > > > Sincerely yours, Ivan
> >> Daschinskiy
> >> > > > > >> > > > > > > > > > >> > > > > >
> >> > > > > >> > > > > > > > > > >> > > > >
> >> > > > > >> > > > > > > > > > >> > > >
> >> > > > > >

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Pavel Tupitsyn <pt...@apache.org>.
I've reviewed the code again and it does not seem right to override
user-defined heartbeat interval with a *bigger* value,
so now I only set it to 1/3 of idleTimeout when the user-specified value is
not already less than that.

On Wed, Feb 16, 2022 at 7:19 PM Pavel Tupitsyn <pt...@apache.org> wrote:

> Ok, let's keep heartbeatInterval then.
> I've updated the code to reflect our recent agreement, please review.
>
> On Tue, Feb 15, 2022 at 8:28 PM Ivan Daschinsky <iv...@gmail.com>
> wrote:
>
>> I personally prefer heartbeatInterval
>>
>> вт, 15 февр. 2022 г., 18:25 Pavel Tupitsyn <pt...@apache.org>:
>>
>> > > What about "keepAlive", "keepAliveInterval" then? It looks more common
>> > and matches the IEP title :)
>> > According to Google, HeartbeatInterval has ~169K results, and
>> > KeepAliveInterval has ~110K :)
>> >
>> > In my experience, both are well understood. I am equally willing to use
>> any
>> > of them.
>> > Any other opinions?
>> >
>> > On Tue, Feb 15, 2022 at 6:11 PM Maksim Timonin <timoninmaxim@apache.org
>> >
>> > wrote:
>> >
>> > > What about "keepAlive", "keepAliveInterval" then? It looks more common
>> > and
>> > > matches the IEP title :)
>> > >
>> > > On Tue, Feb 15, 2022 at 5:54 PM Pavel Tupitsyn <pt...@apache.org>
>> > > wrote:
>> > >
>> > > > To summarize, we add two properties to the ClientConfiguration:
>> > > > bool heartbeatsEnabled = true;
>> > > > long defaultHeartbeatInterval = 60_000; // Default 1 minute, used
>> > > >
>> > > > Logic:
>> > > > if (heartbeatsEnabled) {
>> > > >   heartbeatInterval = serverIdleTimeout > 0 ? serverIdleTimeout / 3
>> :
>> > > > defaultHeartbeatInterval;
>> > > > }
>> > > >
>> > > >
>> > > > Thoughts, objections?
>> > > >
>> > > > On Tue, Feb 15, 2022 at 4:32 PM Ivan Daschinsky <
>> ivandasch@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Pavel, sorry, i've made mistake. But current behaviour is ok for
>> me.
>> > > This
>> > > > > timeout cannot be change on server side runtime. But we can
>> simplify
>> > > > > protocol just use one opcode and message
>> > > > >
>> > > > > вт, 15 февр. 2022 г., 14:54 Ivan Daschinsky <ivandasch@gmail.com
>> >:
>> > > > >
>> > > > > > > Idle timeout can't change, why send it back with every
>> heartbeat
>> > > > > > response?
>> > > > > > May be I am wrong, but from code I see this behaviour. But if I
>> am
>> > > > wrong,
>> > > > > > this is ok behaviour for me.
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > вт, 15 февр. 2022 г. в 14:00, Pavel Tupitsyn <
>> ptupitsyn@apache.org
>> > >:
>> > > > > >
>> > > > > >> Ivan, I mostly agree with your proposal, except this point:
>> > > > > >>
>> > > > > >> > Response to heartbeat request -- is idle timeout
>> > > > > >> Idle timeout can't change, why send it back with every
>> heartbeat
>> > > > > response?
>> > > > > >>
>> > > > > >> > possible cases with cluster restart, upgrade
>> > > > > >> In those cases, a new connection will be established, and we'll
>> > > > retrieve
>> > > > > >> the new timeout after the handshake.
>> > > > > >>
>> > > > > >>
>> > > > > >> On Tue, Feb 15, 2022 at 12:04 PM Maksim Timonin <
>> > > > > timoninmaxim@apache.org>
>> > > > > >> wrote:
>> > > > > >>
>> > > > > >> > Hi Ivan,
>> > > > > >> >
>> > > > > >> > Cases you described sound reasonable to me. Then the client
>> > should
>> > > > > just
>> > > > > >> set
>> > > > > >> > up the `keepAlive` flag, and it just works.
>> > > > > >> >
>> > > > > >> > So, there are 3 branches:
>> > > > > >> > 1. Users don't configure keepAlive at all.
>> > > > > >> > 2. Users configure keepAliveHeartbeatInterval (long, ms).
>> > > > > >> > 3. Users configure keepAlive (boolean).
>> > > > > >> >
>> > > > > >> > AFAIU, Pavel's proposal is about covering the second case
>> only.
>> > > But
>> > > > > >> > actually the 2nd and 3rd aren't conflicted with each other.I
>> > think
>> > > > for
>> > > > > >> both
>> > > > > >> > branches, a cluster should respond with idleTimeout value on
>> > every
>> > > > > keep
>> > > > > >> > alive client request. Because there are possible cases with
>> > > cluster
>> > > > > >> > restart, upgrade, etc. Clients should check every response
>> and
>> > in
>> > > > case
>> > > > > >> of
>> > > > > >> > changed idleTimeout. For 2nd case write a WARN message, and
>> for
>> > > 3rd
>> > > > -
>> > > > > >> > reconfigure themself in case of changed idleTimeout.
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> > On Tue, Feb 15, 2022 at 9:51 AM Ivan Daschinsky <
>> > > > ivandasch@gmail.com>
>> > > > > >> > wrote:
>> > > > > >> >
>> > > > > >> > > Regarding discussion here [1]
>> > > > > >> > >
>> > > > > >> > > I suppose that this feature, despite the fact that initial
>> > > > intention
>> > > > > >> of
>> > > > > >> > > Pavel was different, can drastically
>> > > > > >> > > improve the usage pattern of thin clients and give a lot of
>> > > > > >> opportunities
>> > > > > >> > > if the following is done:
>> > > > > >> > >
>> > > > > >> > > 1. GridNioServer has a great feature -- idle timeout. If  a
>> > > server
>> > > > > did
>> > > > > >> > not
>> > > > > >> > > receive any from a client -- it will be kicked off.
>> > > > > >> > >     But there are some scenarios that make the use of this
>> > > feature
>> > > > > >> > > impossible:
>> > > > > >> > > a. Multiple workers waiting for batch tasks and relatively
>> low
>> > > > > >> requests
>> > > > > >> > > rate -- this services will be often kicked off and must
>> > > reconnect.
>> > > > > >> > > In order to prevent this behaviour, the user must
>> implement a
>> > > kind
>> > > > > of
>> > > > > >> > > heartbeating by himself.
>> > > > > >> > > b. Quite often user may want to implement leader-follower
>> > > pattern
>> > > > > for
>> > > > > >> > > services for HA, so followers also will be considered as
>> idle.
>> > > > > Kicking
>> > > > > >> > off
>> > > > > >> > > these followers
>> > > > > >> > > is not acceptable, so user  should also implement
>> heartbeating
>> > > by
>> > > > > >> > himself.
>> > > > > >> > >
>> > > > > >> > > My proposition is:
>> > > > > >> > > 1. Add two flags -- enable/disable heartbeats, and very
>> > optional
>> > > > > >> > heartbeat
>> > > > > >> > > timeout. Set enable to true by default, timeout to default
>> > > > heartbeat
>> > > > > >> > > timeout.
>> > > > > >> > > 2. If server and client both support this feature, and
>> > > heartbeats
>> > > > > are
>> > > > > >> not
>> > > > > >> > > explicitly disabled on client side:
>> > > > > >> > > 3. Response to heartbeat request -- is idle timeout. If
>> idle
>> > > > timeout
>> > > > > >> is
>> > > > > >> > set
>> > > > > >> > > on the server side , set heartbeat timeout to one-third of
>> it,
>> > > > > instead
>> > > > > >> > set
>> > > > > >> > > to default or specified value.
>> > > > > >> > >
>> > > > > >> > > Pros:
>> > > > > >> > > 1. Easy to set up -- just flag on client side and just set
>> > > timeout
>> > > > > on
>> > > > > >> > > server side.
>> > > > > >> > > 2. Hard to configure improperly, i.e set heartbeat timeout
>> not
>> > > > short
>> > > > > >> > enough
>> > > > > >> > > in order to prevent kicking out by server.
>> > > > > >> > > 3. If the user just wants heartbeats without setting idle
>> > > timeout
>> > > > --
>> > > > > >> > > heartbeats are by default on and with reasonable timeout.
>> > > > > >> > >
>> > > > > >> > > Cons:
>> > > > > >> > > 1. If someone will rely on old behavior and just wants to
>> drop
>> > > his
>> > > > > >> > clients
>> > > > > >> > > on timeout -- this will not work without reconfiguring, he
>> > > should
>> > > > > >> disable
>> > > > > >> > > heartbeats.
>> > > > > >> > > But I cannot even imagine that someone will find this
>> > behaviour
>> > > > > >> > desirable.
>> > > > > >> > > I strongly believe that this behaviour prevents users from
>> > using
>> > > > > >> > > idleTimeout on server side.
>> > > > > >> > >
>> > > > > >> > > [1] --
>> > > > > >>
>> https://github.com/apache/ignite/pull/9817#discussion_r805628955
>> > > > > >> > >
>> > > > > >> > > пт, 11 февр. 2022 г. в 10:58, Pavel Tupitsyn <
>> > > > ptupitsyn@apache.org
>> > > > > >:
>> > > > > >> > >
>> > > > > >> > > > I've prepared a PR, please have a look:
>> > > > > >> > > > https://github.com/apache/ignite/pull/9817
>> > > > > >> > > >
>> > > > > >> > > > On Mon, Feb 7, 2022 at 6:37 PM Ivan Daschinsky <
>> > > > > ivandasch@gmail.com
>> > > > > >> >
>> > > > > >> > > > wrote:
>> > > > > >> > > >
>> > > > > >> > > > > I see potential in this feature, especially if we use
>> > > > something
>> > > > > >> like
>> > > > > >> > > > > continuous query. Stale clients can consume a lot of
>> > > resources
>> > > > > >> and it
>> > > > > >> > > is
>> > > > > >> > > > > worth kick these clients out.
>> > > > > >> > > > >
>> > > > > >> > > > > пн, 7 февр. 2022 г. в 18:25, Pavel Tupitsyn <
>> > > > > ptupitsyn@apache.org
>> > > > > >> >:
>> > > > > >> > > > >
>> > > > > >> > > > > > > If we use new approach, we can reduce this timeout.
>> > But
>> > > > this
>> > > > > >> can
>> > > > > >> > > > affect
>> > > > > >> > > > > > old clients.
>> > > > > >> > > > > >
>> > > > > >> > > > > > idleTimeout is disabled by default, we are not going
>> to
>> > > > change
>> > > > > >> > this.
>> > > > > >> > > > > >
>> > > > > >> > > > > > > Also, let's think about that sending heartbeats and
>> > > > interval
>> > > > > >> of
>> > > > > >> > > > sending
>> > > > > >> > > > > > > heartbeats could be calculated on the server side
>> > (i.e.
>> > > > one
>> > > > > >> third
>> > > > > >> > > of
>> > > > > >> > > > > idle
>> > > > > >> > > > > > > timeout) and sent to the client during handshake.
>> > > > > >> > > > > > > Also we can introduce something like a negotiation
>> > > > mechanism
>> > > > > >> as
>> > > > > >> > in
>> > > > > >> > > > > > > zookeeper.
>> > > > > >> > > > > >
>> > > > > >> > > > > > I tend to agree with Maksim here, let's keep it
>> simple
>> > and
>> > > > > >> > explicit.
>> > > > > >> > > > > > Log a warning, but don't do anything clever.
>> > > > > >> > > > > >
>> > > > > >> > > > > > On Mon, Feb 7, 2022 at 6:15 PM Ivan Daschinsky <
>> > > > > >> > ivandasch@gmail.com>
>> > > > > >> > > > > > wrote:
>> > > > > >> > > > > >
>> > > > > >> > > > > > > >> idleTimeout already exists, I don't think we
>> should
>> > > > > change
>> > > > > >> the
>> > > > > >> > > way
>> > > > > >> > > > > it
>> > > > > >> > > > > > > works (or did I misunderstand you?)
>> > > > > >> > > > > > > If we use new approach, we can reduce this timeout.
>> > But
>> > > > this
>> > > > > >> can
>> > > > > >> > > > affect
>> > > > > >> > > > > > old
>> > > > > >> > > > > > > clients.
>> > > > > >> > > > > > >
>> > > > > >> > > > > > >
>> > > > > >> > > > > > > Also, let's think about that sending heartbeats and
>> > > > interval
>> > > > > >> of
>> > > > > >> > > > sending
>> > > > > >> > > > > > > heartbeats could be calculated on the server side
>> > (i.e.
>> > > > one
>> > > > > >> third
>> > > > > >> > > of
>> > > > > >> > > > > idle
>> > > > > >> > > > > > > timeout) and sent to the client
>> > > > > >> > > > > > > during handshake.
>> > > > > >> > > > > > > Also we can introduce something like a negotiation
>> > > > mechanism
>> > > > > >> as
>> > > > > >> > in
>> > > > > >> > > > > > > zookeeper.
>> > > > > >> > > > > > >
>> > > > > >> > > > > > >
>> > > > > >> > > > > > > пн, 7 февр. 2022 г. в 18:05, Pavel Tupitsyn <
>> > > > > >> > ptupitsyn@apache.org
>> > > > > >> > > >:
>> > > > > >> > > > > > >
>> > > > > >> > > > > > > > Igor,
>> > > > > >> > > > > > > >
>> > > > > >> > > > > > > > > Maybe clients should pass this information on
>> to
>> > the
>> > > > > >> > handshake.
>> > > > > >> > > > > > > >
>> > > > > >> > > > > > > > Do you think we should log a mismatched timeout
>> > > warning
>> > > > on
>> > > > > >> the
>> > > > > >> > > > > server,
>> > > > > >> > > > > > > not
>> > > > > >> > > > > > > > on the client?
>> > > > > >> > > > > > > > Or should we do both?
>> > > > > >> > > > > > > >
>> > > > > >> > > > > > > >
>> > > > > >> > > > > > > > I've updated the proposal with
>> OP_GET_IDLE_TIMEOUT
>> > and
>> > > > > some
>> > > > > >> > other
>> > > > > >> > > > > > details
>> > > > > >> > > > > > > > discussed above.
>> > > > > >> > > > > > > >
>> > > > > >> > > > > > > > On Mon, Feb 7, 2022 at 5:42 PM Igor Sapego <
>> > > > > >> isapego@apache.org
>> > > > > >> > >
>> > > > > >> > > > > wrote:
>> > > > > >> > > > > > > >
>> > > > > >> > > > > > > > > Feature seems useful for me as it makes
>> connection
>> > > > > >> management
>> > > > > >> > > > more
>> > > > > >> > > > > > > robust
>> > > > > >> > > > > > > > > and
>> > > > > >> > > > > > > > > predictable.
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > I agree with Pavel, that we should print
>> warning
>> > > when
>> > > > > >> > heartbeat
>> > > > > >> > > > > > period
>> > > > > >> > > > > > > is
>> > > > > >> > > > > > > > > larger than
>> > > > > >> > > > > > > > > idle timeout, but I see a problem here as idle
>> > > timeout
>> > > > > is
>> > > > > >> > > > > configured
>> > > > > >> > > > > > on
>> > > > > >> > > > > > > > > server and is not
>> > > > > >> > > > > > > > > known to clients, while heartbeats configured
>> on
>> > > > clients
>> > > > > >> and
>> > > > > >> > > > their
>> > > > > >> > > > > > > period
>> > > > > >> > > > > > > > > is not known
>> > > > > >> > > > > > > > > to the server. Maybe clients should pass this
>> > > > > information
>> > > > > >> on
>> > > > > >> > to
>> > > > > >> > > > the
>> > > > > >> > > > > > > > > handshake.
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > Regarding Python and PHP clients - can not we
>> use
>> > > some
>> > > > > >> kind
>> > > > > >> > of
>> > > > > >> > > > > timers
>> > > > > >> > > > > > > to
>> > > > > >> > > > > > > > > implement
>> > > > > >> > > > > > > > > this feature?
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > Best Regards,
>> > > > > >> > > > > > > > > Igor
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <
>> > > > > >> > > > > ptupitsyn@apache.org>
>> > > > > >> > > > > > > > > wrote:
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > Maksim, agree. Let's not be too clever and
>> only
>> > > log
>> > > > a
>> > > > > >> > > warning.
>> > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel
>> Tupitsyn <
>> > > > > >> > > > > > ptupitsyn@apache.org>
>> > > > > >> > > > > > > > > > wrote:
>> > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > > > > Ivan, idleTimeout already exists, I don't
>> > think
>> > > we
>> > > > > >> should
>> > > > > >> > > > > change
>> > > > > >> > > > > > > the
>> > > > > >> > > > > > > > > way
>> > > > > >> > > > > > > > > > > it works (or did I misunderstand you?)
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > Of course, enabling heartbeats means that
>> > > > otherwise
>> > > > > >> idle
>> > > > > >> > > > > clients
>> > > > > >> > > > > > > will
>> > > > > >> > > > > > > > > no
>> > > > > >> > > > > > > > > > > longer be disconnected by the server.
>> > > > > >> > > > > > > > > > > I think we should cross-link those
>> properties
>> > in
>> > > > the
>> > > > > >> > > > > > documentation
>> > > > > >> > > > > > > > and
>> > > > > >> > > > > > > > > > > explain this behavior.
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > > On Mon, Feb 7, 2022 at 4:39 PM Ivan
>> > Daschinsky <
>> > > > > >> > > > > > > ivandasch@gmail.com>
>> > > > > >> > > > > > > > > > > wrote:
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > > >> >>3. Already implemented: when
>> > > > > >> > > > > > > > > ClientConnectorConfiguration#idleTimeout
>> > > > > >> > > > > > > > > > is
>> > > > > >> > > > > > > > > > >> not zero, server disconnects idle clients
>> > > > > >> > > > > > > > > > >> >>
>> > > > > >> > > > > > > > > > >> But I suppose it would be great to have:
>> > > > > >> > > > > > > > > > >> 1. If client supports keep alive, use
>> > > idleTimeout
>> > > > > >> > > > > > > > > > >> 2. If not, do not use it.
>> > > > > >> > > > > > > > > > >>
>> > > > > >> > > > > > > > > > >> But I am not sure if it is correct or not.
>> > > > > >> > > > > > > > > > >>
>> > > > > >> > > > > > > > > > >> пн, 7 февр. 2022 г. в 16:01, Maksim
>> Timonin <
>> > > > > >> > > > > > > > timoninmaxim@apache.org
>> > > > > >> > > > > > > > > >:
>> > > > > >> > > > > > > > > > >>
>> > > > > >> > > > > > > > > > >> > I believe explicit is better than
>> implicit
>> > :)
>> > > > > Also
>> > > > > >> in
>> > > > > >> > > case
>> > > > > >> > > > > of
>> > > > > >> > > > > > > > > dynamic
>> > > > > >> > > > > > > > > > >> > calculation of timeout, it can change
>> > > > > dynamically,
>> > > > > >> for
>> > > > > >> > > > > example
>> > > > > >> > > > > > > > > > >> restarting a
>> > > > > >> > > > > > > > > > >> > cluster with different configuration
>> should
>> > > > > >> > reconfigure
>> > > > > >> > > > > > clients
>> > > > > >> > > > > > > > too.
>> > > > > >> > > > > > > > > > >> Looks
>> > > > > >> > > > > > > > > > >> > complicated.
>> > > > > >> > > > > > > > > > >> >
>> > > > > >> > > > > > > > > > >> > My vote for WARN + javadocs with
>> mention of
>> > > > this
>> > > > > >> > issue.
>> > > > > >> > > > > > > > > > >> >
>> > > > > >> > > > > > > > > > >> > On Mon, Feb 7, 2022 at 3:51 PM Pavel
>> > > Tupitsyn <
>> > > > > >> > > > > > > > ptupitsyn@apache.org
>> > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > > > >> > wrote:
>> > > > > >> > > > > > > > > > >> >
>> > > > > >> > > > > > > > > > >> > > > WDYT, should we add a WARN message
>> for
>> > > > > clients
>> > > > > >> > that
>> > > > > >> > > > > > > configure
>> > > > > >> > > > > > > > > > >> > > > keepAliveTimeout greater than
>> > idleTimeout
>> > > > on
>> > > > > >> the
>> > > > > >> > > > server
>> > > > > >> > > > > > > side?
>> > > > > >> > > > > > > > > > >> > >
>> > > > > >> > > > > > > > > > >> > > I think we should either log a WARN,
>> or
>> > > > > retrieve
>> > > > > >> > > > > idleTimeout
>> > > > > >> > > > > > > > from
>> > > > > >> > > > > > > > > > >> server
>> > > > > >> > > > > > > > > > >> > > and configure heartbeatTimeout
>> > accordingly
>> > > > > (e.g.
>> > > > > >> > > divide
>> > > > > >> > > > by
>> > > > > >> > > > > > 2).
>> > > > > >> > > > > > > > > > >> > > Thoughts?
>> > > > > >> > > > > > > > > > >> > >
>> > > > > >> > > > > > > > > > >> > > On Mon, Feb 7, 2022 at 3:14 PM Maksim
>> > > > Timonin <
>> > > > > >> > > > > > > > > > >> timoninmaxim@apache.org>
>> > > > > >> > > > > > > > > > >> > > wrote:
>> > > > > >> > > > > > > > > > >> > >
>> > > > > >> > > > > > > > > > >> > > > Hi Pavel,
>> > > > > >> > > > > > > > > > >> > > >
>> > > > > >> > > > > > > > > > >> > > > Thanks for the links. Yes, I forgot
>> > that
>> > > > the
>> > > > > >> flag
>> > > > > >> > of
>> > > > > >> > > > > > changed
>> > > > > >> > > > > > > > > > >> topology
>> > > > > >> > > > > > > > > > >> > is
>> > > > > >> > > > > > > > > > >> > > > lazy. Also I missed that the
>> keepAlive
>> > > > > setting
>> > > > > >> is
>> > > > > >> > > > > > configured
>> > > > > >> > > > > > > > on
>> > > > > >> > > > > > > > > > the
>> > > > > >> > > > > > > > > > >> > > client
>> > > > > >> > > > > > > > > > >> > > > side (alternatively to idleTimeout
>> that
>> > > is
>> > > > on
>> > > > > >> the
>> > > > > >> > > > server
>> > > > > >> > > > > > > > side).
>> > > > > >> > > > > > > > > > >> > > >
>> > > > > >> > > > > > > > > > >> > > > Now I understand, this feature can
>> be
>> > > > helpful
>> > > > > >> > then.
>> > > > > >> > > > > Every
>> > > > > >> > > > > > > > client
>> > > > > >> > > > > > > > > > can
>> > > > > >> > > > > > > > > > >> > > > configure itself in case it's
>> possible
>> > to
>> > > > be
>> > > > > >> idle
>> > > > > >> > > > > > sometimes,
>> > > > > >> > > > > > > > and
>> > > > > >> > > > > > > > > > >> choose
>> > > > > >> > > > > > > > > > >> > > > an appropriate timeout by itself
>> too.
>> > And
>> > > > by
>> > > > > >> > default
>> > > > > >> > > > the
>> > > > > >> > > > > > > > feature
>> > > > > >> > > > > > > > > > >> should
>> > > > > >> > > > > > > > > > >> > > be
>> > > > > >> > > > > > > > > > >> > > > disabled.
>> > > > > >> > > > > > > > > > >> > > >
>> > > > > >> > > > > > > > > > >> > > > WDYT, should we add a WARN message
>> for
>> > > > > clients
>> > > > > >> > that
>> > > > > >> > > > > > > configure
>> > > > > >> > > > > > > > > > >> > > > keepAliveTimeout greater than
>> > idleTimeout
>> > > > on
>> > > > > >> the
>> > > > > >> > > > server
>> > > > > >> > > > > > > side?
>> > > > > >> > > > > > > > > > >> > > >
>> > > > > >> > > > > > > > > > >> > > >
>> > > > > >> > > > > > > > > > >> > > >
>> > > > > >> > > > > > > > > > >> > > > On Mon, Feb 7, 2022 at 1:05 PM Pavel
>> > > > > Tupitsyn <
>> > > > > >> > > > > > > > > > ptupitsyn@apache.org
>> > > > > >> > > > > > > > > > >> >
>> > > > > >> > > > > > > > > > >> > > > wrote:
>> > > > > >> > > > > > > > > > >> > > >
>> > > > > >> > > > > > > > > > >> > > > > Ivan,
>> > > > > >> > > > > > > > > > >> > > > >
>> > > > > >> > > > > > > > > > >> > > > > I suggest the following:
>> > > > > >> > > > > > > > > > >> > > > >
>> > > > > >> > > > > > > > > > >> > > > > 1. Server sends KEEP_ALIVE feature
>> > > flag,
>> > > > > >> which
>> > > > > >> > > means
>> > > > > >> > > > > it
>> > > > > >> > > > > > > > > accepts
>> > > > > >> > > > > > > > > > >> > > > > OP_KEEP_ALIVE empty message
>> > > > > >> > > > > > > > > > >> > > > > 2. Client sends OP_KEEP_ALIVE when
>> > the
>> > > > > >> > connection
>> > > > > >> > > is
>> > > > > >> > > > > > idle
>> > > > > >> > > > > > > > for
>> > > > > >> > > > > > > > > a
>> > > > > >> > > > > > > > > > >> > > > > certain period of time
>> > > > > >> > > > > > > > > > >> > > > > 3. Already implemented: when
>> > > > > >> > > > > > > > > > >> ClientConnectorConfiguration#idleTimeout
>> > > > > >> > > > > > > > > > >> > > is
>> > > > > >> > > > > > > > > > >> > > > > not zero, server disconnects idle
>> > > clients
>> > > > > >> > > > > > > > > > >> > > > >
>> > > > > >> > > > > > > > > > >> > > > > This way we don't need
>> server->client
>> > > > > >> > keepalives,
>> > > > > >> > > as
>> > > > > >> > > > > you
>> > > > > >> > > > > > > > > > correctly
>> > > > > >> > > > > > > > > > >> > > noted.
>> > > > > >> > > > > > > > > > >> > > > >
>> > > > > >> > > > > > > > > > >> > > > > On Mon, Feb 7, 2022 at 12:43 PM
>> Ivan
>> > > > > >> Daschinsky
>> > > > > >> > <
>> > > > > >> > > > > > > > > > >> ivandasch@gmail.com
>> > > > > >> > > > > > > > > > >> > >
>> > > > > >> > > > > > > > > > >> > > > > wrote:
>> > > > > >> > > > > > > > > > >> > > > >
>> > > > > >> > > > > > > > > > >> > > > > > Pavel, I suppose that ideally:
>> > > > > >> > > > > > > > > > >> > > > > > 1. Client send in handshake
>> flag,
>> > > that
>> > > > it
>> > > > > >> > > supports
>> > > > > >> > > > > > > > > KEEP_ALIVE
>> > > > > >> > > > > > > > > > >> > feature
>> > > > > >> > > > > > > > > > >> > > > and
>> > > > > >> > > > > > > > > > >> > > > > > server takes it into account.
>> > > > > >> > > > > > > > > > >> > > > > > 2. Each request of client can be
>> > > > > >> considered as
>> > > > > >> > > > > > > keep-alive
>> > > > > >> > > > > > > > > > ping.
>> > > > > >> > > > > > > > > > >> > > > > > 3. Client send failure should be
>> > > > > processed
>> > > > > >> > using
>> > > > > >> > > > > retry
>> > > > > >> > > > > > > > > policy.
>> > > > > >> > > > > > > > > > >> > > > > > 4. Server should not send
>> > keep-alive
>> > > > > >> packets,
>> > > > > >> > it
>> > > > > >> > > > is
>> > > > > >> > > > > > > > > redundant,
>> > > > > >> > > > > > > > > > >> but
>> > > > > >> > > > > > > > > > >> > > > server
>> > > > > >> > > > > > > > > > >> > > > > > should track requests from
>> client
>> > and
>> > > > if
>> > > > > >> there
>> > > > > >> > > is
>> > > > > >> > > > no
>> > > > > >> > > > > > > > > requests
>> > > > > >> > > > > > > > > > >> from
>> > > > > >> > > > > > > > > > >> > > > client
>> > > > > >> > > > > > > > > > >> > > > > > with KEEP_ALIVE feature,
>> > > > > >> > > > > > > > > > >> > > > > > automatically close connection
>> and
>> > > free
>> > > > > >> > > resources.
>> > > > > >> > > > > > > > > > >> > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > Similar approach is used in
>> > zookeeper
>> > > > > >> clients.
>> > > > > >> > > > > > > > > > >> > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > пн, 7 февр. 2022 г. в 12:24,
>> Pavel
>> > > > > >> Tupitsyn <
>> > > > > >> > > > > > > > > > >> ptupitsyn@apache.org
>> > > > > >> > > > > > > > > > >> > >:
>> > > > > >> > > > > > > > > > >> > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > Ivan,
>> > > > > >> > > > > > > > > > >> > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > Ideally, the check should come
>> > from
>> > > > > both
>> > > > > >> > > sides.
>> > > > > >> > > > > > > > > > >> > > > > > > - Client periodically sends
>> > > keepalive
>> > > > > to
>> > > > > >> > > server
>> > > > > >> > > > > > > > > > >> > > > > > > - Server periodically sends
>> > > keepalive
>> > > > > to
>> > > > > >> > > client
>> > > > > >> > > > > > > > > > >> > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > Feature flags will be added
>> > > > > accordingly,
>> > > > > >> so
>> > > > > >> > it
>> > > > > >> > > > is
>> > > > > >> > > > > > not
>> > > > > >> > > > > > > > > > >> necessary
>> > > > > >> > > > > > > > > > >> > to
>> > > > > >> > > > > > > > > > >> > > > > > > implement this in all thin
>> > clients.
>> > > > > >> > > > > > > > > > >> > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > On Mon, Feb 7, 2022 at 11:43
>> AM
>> > > Ivan
>> > > > > >> > > Daschinsky
>> > > > > >> > > > <
>> > > > > >> > > > > > > > > > >> > > ivandasch@gmail.com
>> > > > > >> > > > > > > > > > >> > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > wrote:
>> > > > > >> > > > > > > > > > >> > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > I suppose it is great idea,
>> but
>> > > > this
>> > > > > >> > > > > functionality
>> > > > > >> > > > > > > can
>> > > > > >> > > > > > > > > be
>> > > > > >> > > > > > > > > > >> hard
>> > > > > >> > > > > > > > > > >> > to
>> > > > > >> > > > > > > > > > >> > > > > > > implement
>> > > > > >> > > > > > > > > > >> > > > > > > > for some platforms. I.e.
>> sync
>> > > > python
>> > > > > >> > client
>> > > > > >> > > or
>> > > > > >> > > > > php
>> > > > > >> > > > > > > > > (there
>> > > > > >> > > > > > > > > > >> is no
>> > > > > >> > > > > > > > > > >> > > > real
>> > > > > >> > > > > > > > > > >> > > > > > > > multithreading for python
>> (GIL)
>> > > and
>> > > > > >> php is
>> > > > > >> > > > > single
>> > > > > >> > > > > > > > > threaded
>> > > > > >> > > > > > > > > > >> by
>> > > > > >> > > > > > > > > > >> > > > > design).
>> > > > > >> > > > > > > > > > >> > > > > > > But
>> > > > > >> > > > > > > > > > >> > > > > > > > for async clients it is not
>> > very
>> > > > hard
>> > > > > >> to
>> > > > > >> > > > > > implement.
>> > > > > >> > > > > > > > > > >> > Nevertheless,
>> > > > > >> > > > > > > > > > >> > > > > this
>> > > > > >> > > > > > > > > > >> > > > > > > > feature should be optional,
>> > > because
>> > > > > of
>> > > > > >> > > > possible
>> > > > > >> > > > > > > > > technical
>> > > > > >> > > > > > > > > > >> > > > > limitations.
>> > > > > >> > > > > > > > > > >> > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > Pavel, is this check mostly
>> for
>> > > > > client
>> > > > > >> > side?
>> > > > > >> > > > Or
>> > > > > >> > > > > > > > servers
>> > > > > >> > > > > > > > > > can
>> > > > > >> > > > > > > > > > >> do
>> > > > > >> > > > > > > > > > >> > > some
>> > > > > >> > > > > > > > > > >> > > > > > > actions
>> > > > > >> > > > > > > > > > >> > > > > > > > if there is no activity from
>> > thin
>> > > > > >> client
>> > > > > >> > > (i.e.
>> > > > > >> > > > > > > closing
>> > > > > >> > > > > > > > > > >> context
>> > > > > >> > > > > > > > > > >> > > and
>> > > > > >> > > > > > > > > > >> > > > > free
>> > > > > >> > > > > > > > > > >> > > > > > > > resources such as queries'
>> > > handles
>> > > > > and
>> > > > > >> so
>> > > > > >> > > on?)
>> > > > > >> > > > > > > > > > >> > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > пн, 7 февр. 2022 г. в 11:09,
>> > > Pavel
>> > > > > >> > Tupitsyn
>> > > > > >> > > <
>> > > > > >> > > > > > > > > > >> > > ptupitsyn@apache.org
>> > > > > >> > > > > > > > > > >> > > > >:
>> > > > > >> > > > > > > > > > >> > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > Hi Maksim,
>> > > > > >> > > > > > > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > > half-state is a possible
>> > > > > situation
>> > > > > >> > when
>> > > > > >> > > an
>> > > > > >> > > > > > > Ignite
>> > > > > >> > > > > > > > > node
>> > > > > >> > > > > > > > > > >> goes
>> > > > > >> > > > > > > > > > >> > > > down
>> > > > > >> > > > > > > > > > >> > > > > or
>> > > > > >> > > > > > > > > > >> > > > > > > > > somehow removes connection
>> > to a
>> > > > > thin
>> > > > > >> > > client
>> > > > > >> > > > > > > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > Half-open state is also
>> > > possible
>> > > > > >> when,
>> > > > > >> > for
>> > > > > >> > > > > > > example,
>> > > > > >> > > > > > > > an
>> > > > > >> > > > > > > > > > >> > > > intermediate
>> > > > > >> > > > > > > > > > >> > > > > > > > router
>> > > > > >> > > > > > > > > > >> > > > > > > > > is rebooted [1].
>> > > > > >> > > > > > > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > This is what we seem to
>> have
>> > > > > >> encountered
>> > > > > >> > > > with
>> > > > > >> > > > > > one
>> > > > > >> > > > > > > of
>> > > > > >> > > > > > > > > our
>> > > > > >> > > > > > > > > > >> > > > customers
>> > > > > >> > > > > > > > > > >> > > > > -
>> > > > > >> > > > > > > > > > >> > > > > > > they
>> > > > > >> > > > > > > > > > >> > > > > > > > > have a stable cluster, and
>> > > > > >> long-living
>> > > > > >> > > > > (multiple
>> > > > > >> > > > > > > > days)
>> > > > > >> > > > > > > > > > >> thin
>> > > > > >> > > > > > > > > > >> > > > client
>> > > > > >> > > > > > > > > > >> > > > > > > > > connections which can be
>> idle
>> > > for
>> > > > > >> some
>> > > > > >> > > time.
>> > > > > >> > > > > > > > > > >> > > > > > > > > And only when we send some
>> > data
>> > > > on
>> > > > > >> such
>> > > > > >> > an
>> > > > > >> > > > > idle
>> > > > > >> > > > > > > > > > >> connection do
>> > > > > >> > > > > > > > > > >> > > we
>> > > > > >> > > > > > > > > > >> > > > > > > discover
>> > > > > >> > > > > > > > > > >> > > > > > > > > that it is broken.
>> > > > > >> > > > > > > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > > But with enabled (true
>> by
>> > > > > default)
>> > > > > >> > > > > > > > > partitionAwareness
>> > > > > >> > > > > > > > > > >> > feature
>> > > > > >> > > > > > > > > > >> > > > > > clients
>> > > > > >> > > > > > > > > > >> > > > > > > > can
>> > > > > >> > > > > > > > > > >> > > > > > > > > be notified about topology
>> > > > changes
>> > > > > >> > > > > > > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > Partition awareness is a
>> > "lazy"
>> > > > > >> > > notification
>> > > > > >> > > > > in
>> > > > > >> > > > > > a
>> > > > > >> > > > > > > > form
>> > > > > >> > > > > > > > > > of
>> > > > > >> > > > > > > > > > >> a
>> > > > > >> > > > > > > > > > >> > > > > response
>> > > > > >> > > > > > > > > > >> > > > > > > > > message flag [2].
>> > > > > >> > > > > > > > > > >> > > > > > > > > You won't get one on an
>> idle
>> > > > > >> connection.
>> > > > > >> > > > > > > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > > the connections are
>> removed
>> > > on
>> > > > > the
>> > > > > >> > > server
>> > > > > >> > > > > side
>> > > > > >> > > > > > > by
>> > > > > >> > > > > > > > > > client
>> > > > > >> > > > > > > > > > >> > idle
>> > > > > >> > > > > > > > > > >> > > > > > timeout
>> > > > > >> > > > > > > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > Idle timeout is disabled
>> by
>> > > > > default.
>> > > > > >> > > > > > > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > > is it OK to keep such
>> > > > connections
>> > > > > >> > alive
>> > > > > >> > > > for
>> > > > > >> > > > > a
>> > > > > >> > > > > > > long
>> > > > > >> > > > > > > > > > time
>> > > > > >> > > > > > > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > I think it is up to the
>> user.
>> > > > > >> > > > > > > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > > in the case of partition
>> > > > > awareness
>> > > > > >> > > > features
>> > > > > >> > > > > it
>> > > > > >> > > > > > > can
>> > > > > >> > > > > > > > > > lead
>> > > > > >> > > > > > > > > > >> to
>> > > > > >> > > > > > > > > > >> > > > > wasting
>> > > > > >> > > > > > > > > > >> > > > > > > TCP
>> > > > > >> > > > > > > > > > >> > > > > > > > > sockets on Ignite nodes,
>> > can't
>> > > it
>> > > > > >> > > > > > > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > Can you please elaborate?
>> > > > > >> > > > > > > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > [1]
>> > > > > >> > > > > > > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > >
>> > > > > >> > > > > > > > > > >> > > > >
>> > > > > >> > > > > > > > > > >> > > >
>> > > > > >> > > > > > > > > > >> > >
>> > > > > >> > > > > > > > > > >> >
>> > > > > >> > > > > > > > > > >>
>> > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > >
>> > > > > >> > > > > > >
>> > > > > >> > > > > >
>> > > > > >> > > > >
>> > > > > >> > > >
>> > > > > >> > >
>> > > > > >> >
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
>> > > > > >> > > > > > > > > > >> > > > > > > > > [2]
>> > > > > >> > > > > > > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > >
>> > > > > >> > > > > > > > > > >> > > > >
>> > > > > >> > > > > > > > > > >> > > >
>> > > > > >> > > > > > > > > > >> > >
>> > > > > >> > > > > > > > > > >> >
>> > > > > >> > > > > > > > > > >>
>> > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > >
>> > > > > >> > > > > > >
>> > > > > >> > > > > >
>> > > > > >> > > > >
>> > > > > >> > > >
>> > > > > >> > >
>> > > > > >> >
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
>> > > > > >> > > > > > > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > On Fri, Feb 4, 2022 at
>> 4:01
>> > PM
>> > > > > Maksim
>> > > > > >> > > > Timonin
>> > > > > >> > > > > <
>> > > > > >> > > > > > > > > > >> > > > > > timoninmaxim@apache.org
>> > > > > >> > > > > > > > > > >> > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > wrote:
>> > > > > >> > > > > > > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > > Hi Pavel,
>> > > > > >> > > > > > > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > > Thanks for starting this
>> > > > thread!
>> > > > > >> Can I
>> > > > > >> > > ask
>> > > > > >> > > > > > some
>> > > > > >> > > > > > > > > > >> questions
>> > > > > >> > > > > > > > > > >> > > here
>> > > > > >> > > > > > > > > > >> > > > to
>> > > > > >> > > > > > > > > > >> > > > > > get
>> > > > > >> > > > > > > > > > >> > > > > > > > the
>> > > > > >> > > > > > > > > > >> > > > > > > > > > feature more clearly?
>> > > > > >> > > > > > > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > > As I understand it
>> > correctly,
>> > > > > >> > half-state
>> > > > > >> > > > is
>> > > > > >> > > > > a
>> > > > > >> > > > > > > > > possible
>> > > > > >> > > > > > > > > > >> > > > situation
>> > > > > >> > > > > > > > > > >> > > > > > when
>> > > > > >> > > > > > > > > > >> > > > > > > > an
>> > > > > >> > > > > > > > > > >> > > > > > > > > > Ignite node goes down or
>> > > > somehow
>> > > > > >> > removes
>> > > > > >> > > > > > > > connection
>> > > > > >> > > > > > > > > > to a
>> > > > > >> > > > > > > > > > >> > thin
>> > > > > >> > > > > > > > > > >> > > > > > client.
>> > > > > >> > > > > > > > > > >> > > > > > > > But
>> > > > > >> > > > > > > > > > >> > > > > > > > > > with enabled (true by
>> > > default)
>> > > > > >> > > > > > > partitionAwareness
>> > > > > >> > > > > > > > > > >> feature
>> > > > > >> > > > > > > > > > >> > > > clients
>> > > > > >> > > > > > > > > > >> > > > > > can
>> > > > > >> > > > > > > > > > >> > > > > > > > be
>> > > > > >> > > > > > > > > > >> > > > > > > > > > notified about topology
>> > > > changes.
>> > > > > >> So,
>> > > > > >> > > there
>> > > > > >> > > > > are
>> > > > > >> > > > > > > > > > possible
>> > > > > >> > > > > > > > > > >> > > cases:
>> > > > > >> > > > > > > > > > >> > > > > > > > > > 1. ThinClient connects
>> to a
>> > > > > single
>> > > > > >> > node.
>> > > > > >> > > > > > > > > > >> > > > > > > > > > 2. Ignite node removes
>> > > > connection
>> > > > > >> from
>> > > > > >> > > > > itself.
>> > > > > >> > > > > > > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > > I like the idea for the
>> > case
>> > > > > with a
>> > > > > >> > > single
>> > > > > >> > > > > > node,
>> > > > > >> > > > > > > > as
>> > > > > >> > > > > > > > > it
>> > > > > >> > > > > > > > > > >> > helps
>> > > > > >> > > > > > > > > > >> > > > fail
>> > > > > >> > > > > > > > > > >> > > > > > > fast.
>> > > > > >> > > > > > > > > > >> > > > > > > > > > But is it OK to connect
>> a
>> > > > client
>> > > > > >> to a
>> > > > > >> > > > single
>> > > > > >> > > > > > > node
>> > > > > >> > > > > > > > > > only?
>> > > > > >> > > > > > > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > > For the second one: you
>> > > mention
>> > > > > >> that a
>> > > > > >> > > > case
>> > > > > >> > > > > > for
>> > > > > >> > > > > > > > the
>> > > > > >> > > > > > > > > > >> second
>> > > > > >> > > > > > > > > > >> > > > option
>> > > > > >> > > > > > > > > > >> > > > > > is
>> > > > > >> > > > > > > > > > >> > > > > > > > > > "Long-living and mostly
>> > idle
>> > > > > >> > connections
>> > > > > >> > > > are
>> > > > > >> > > > > > > > > > especially
>> > > > > >> > > > > > > > > > >> > > > > susceptible
>> > > > > >> > > > > > > > > > >> > > > > > > to
>> > > > > >> > > > > > > > > > >> > > > > > > > > this
>> > > > > >> > > > > > > > > > >> > > > > > > > > > behavior". If I
>> understand
>> > > > > >> correctly
>> > > > > >> > the
>> > > > > >> > > > > > > > connections
>> > > > > >> > > > > > > > > > are
>> > > > > >> > > > > > > > > > >> > > > removed
>> > > > > >> > > > > > > > > > >> > > > > on
>> > > > > >> > > > > > > > > > >> > > > > > > the
>> > > > > >> > > > > > > > > > >> > > > > > > > > > server side by client
>> idle
>> > > > > timeout.
>> > > > > >> > Can
>> > > > > >> > > we
>> > > > > >> > > > > > just
>> > > > > >> > > > > > > > > > >> configure
>> > > > > >> > > > > > > > > > >> > the
>> > > > > >> > > > > > > > > > >> > > > > idle
>> > > > > >> > > > > > > > > > >> > > > > > > > > timeout
>> > > > > >> > > > > > > > > > >> > > > > > > > > > for cases where we
>> really
>> > > need
>> > > > > >> keeping
>> > > > > >> > > > alive
>> > > > > >> > > > > > > idle
>> > > > > >> > > > > > > > > > >> > > connections?
>> > > > > >> > > > > > > > > > >> > > > > Are
>> > > > > >> > > > > > > > > > >> > > > > > > > there
>> > > > > >> > > > > > > > > > >> > > > > > > > > > any other cases with
>> > > > unexpectedly
>> > > > > >> > > dropped
>> > > > > >> > > > > > > > > connections?
>> > > > > >> > > > > > > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > > I'm wondering is it OK
>> to
>> > > keep
>> > > > > such
>> > > > > >> > > > > > connections
>> > > > > >> > > > > > > > > alive
>> > > > > >> > > > > > > > > > >> for a
>> > > > > >> > > > > > > > > > >> > > > long
>> > > > > >> > > > > > > > > > >> > > > > > > time?
>> > > > > >> > > > > > > > > > >> > > > > > > > > > Also in the case of
>> > partition
>> > > > > >> > awareness
>> > > > > >> > > > > > features
>> > > > > >> > > > > > > > it
>> > > > > >> > > > > > > > > > can
>> > > > > >> > > > > > > > > > >> > lead
>> > > > > >> > > > > > > > > > >> > > to
>> > > > > >> > > > > > > > > > >> > > > > > > wasting
>> > > > > >> > > > > > > > > > >> > > > > > > > > TCP
>> > > > > >> > > > > > > > > > >> > > > > > > > > > sockets on Ignite nodes,
>> > > can't
>> > > > > it?
>> > > > > >> > > > > > > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > > Thanks!
>> > > > > >> > > > > > > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > > On Thu, Feb 3, 2022 at
>> 2:24
>> > > PM
>> > > > > >> Pavel
>> > > > > >> > > > > Tupitsyn
>> > > > > >> > > > > > <
>> > > > > >> > > > > > > > > > >> > > > > > ptupitsyn@apache.org>
>> > > > > >> > > > > > > > > > >> > > > > > > > > > wrote:
>> > > > > >> > > > > > > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > > >> Igniters,
>> > > > > >> > > > > > > > > > >> > > > > > > > > >>
>> > > > > >> > > > > > > > > > >> > > > > > > > > >> Please review the
>> proposal
>> > > to
>> > > > > add
>> > > > > >> > > > heartbeat
>> > > > > >> > > > > > > > > messages
>> > > > > >> > > > > > > > > > to
>> > > > > >> > > > > > > > > > >> > the
>> > > > > >> > > > > > > > > > >> > > > thin
>> > > > > >> > > > > > > > > > >> > > > > > > > client
>> > > > > >> > > > > > > > > > >> > > > > > > > > >> protocol (both 2.x and
>> > 3.x)
>> > > > and
>> > > > > >> let
>> > > > > >> > me
>> > > > > >> > > > know
>> > > > > >> > > > > > > your
>> > > > > >> > > > > > > > > > >> thoughts:
>> > > > > >> > > > > > > > > > >> > > > > > > > > >>
>> > > > > >> > > > > > > > > > >> > > > > > > > > >>
>> > > > > >> > > > > > > > > > >> > > > > > > > > >>
>> > > > > >> > > > > > > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > >
>> > > > > >> > > > > > > > > > >> > > > >
>> > > > > >> > > > > > > > > > >> > > >
>> > > > > >> > > > > > > > > > >> > >
>> > > > > >> > > > > > > > > > >> >
>> > > > > >> > > > > > > > > > >>
>> > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > >
>> > > > > >> > > > > > >
>> > > > > >> > > > > >
>> > > > > >> > > > >
>> > > > > >> > > >
>> > > > > >> > >
>> > > > > >> >
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
>> > > > > >> > > > > > > > > > >> > > > > > > > > >>
>> > > > > >> > > > > > > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > > > --
>> > > > > >> > > > > > > > > > >> > > > > > > > Sincerely yours, Ivan
>> > Daschinskiy
>> > > > > >> > > > > > > > > > >> > > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > >
>> > > > > >> > > > > > > > > > >> > > > > >
>> > > > > >> > > > > > > > > > >> > > > > >
>> > > > > >> > > > > > > > > > >> > > > > > --
>> > > > > >> > > > > > > > > > >> > > > > > Sincerely yours, Ivan
>> Daschinskiy
>> > > > > >> > > > > > > > > > >> > > > > >
>> > > > > >> > > > > > > > > > >> > > > >
>> > > > > >> > > > > > > > > > >> > > >
>> > > > > >> > > > > > > > > > >> > >
>> > > > > >> > > > > > > > > > >> >
>> > > > > >> > > > > > > > > > >>
>> > > > > >> > > > > > > > > > >>
>> > > > > >> > > > > > > > > > >> --
>> > > > > >> > > > > > > > > > >> Sincerely yours, Ivan Daschinskiy
>> > > > > >> > > > > > > > > > >>
>> > > > > >> > > > > > > > > > >
>> > > > > >> > > > > > > > > >
>> > > > > >> > > > > > > > >
>> > > > > >> > > > > > > >
>> > > > > >> > > > > > >
>> > > > > >> > > > > > >
>> > > > > >> > > > > > > --
>> > > > > >> > > > > > > Sincerely yours, Ivan Daschinskiy
>> > > > > >> > > > > > >
>> > > > > >> > > > > >
>> > > > > >> > > > >
>> > > > > >> > > > >
>> > > > > >> > > > > --
>> > > > > >> > > > > Sincerely yours, Ivan Daschinskiy
>> > > > > >> > > > >
>> > > > > >> > > >
>> > > > > >> > >
>> > > > > >> > >
>> > > > > >> > > --
>> > > > > >> > > Sincerely yours, Ivan Daschinskiy
>> > > > > >> > >
>> > > > > >> >
>> > > > > >>
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > Sincerely yours, Ivan Daschinskiy
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Pavel Tupitsyn <pt...@apache.org>.
Ok, let's keep heartbeatInterval then.
I've updated the code to reflect our recent agreement, please review.

On Tue, Feb 15, 2022 at 8:28 PM Ivan Daschinsky <iv...@gmail.com> wrote:

> I personally prefer heartbeatInterval
>
> вт, 15 февр. 2022 г., 18:25 Pavel Tupitsyn <pt...@apache.org>:
>
> > > What about "keepAlive", "keepAliveInterval" then? It looks more common
> > and matches the IEP title :)
> > According to Google, HeartbeatInterval has ~169K results, and
> > KeepAliveInterval has ~110K :)
> >
> > In my experience, both are well understood. I am equally willing to use
> any
> > of them.
> > Any other opinions?
> >
> > On Tue, Feb 15, 2022 at 6:11 PM Maksim Timonin <ti...@apache.org>
> > wrote:
> >
> > > What about "keepAlive", "keepAliveInterval" then? It looks more common
> > and
> > > matches the IEP title :)
> > >
> > > On Tue, Feb 15, 2022 at 5:54 PM Pavel Tupitsyn <pt...@apache.org>
> > > wrote:
> > >
> > > > To summarize, we add two properties to the ClientConfiguration:
> > > > bool heartbeatsEnabled = true;
> > > > long defaultHeartbeatInterval = 60_000; // Default 1 minute, used
> > > >
> > > > Logic:
> > > > if (heartbeatsEnabled) {
> > > >   heartbeatInterval = serverIdleTimeout > 0 ? serverIdleTimeout / 3 :
> > > > defaultHeartbeatInterval;
> > > > }
> > > >
> > > >
> > > > Thoughts, objections?
> > > >
> > > > On Tue, Feb 15, 2022 at 4:32 PM Ivan Daschinsky <ivandasch@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Pavel, sorry, i've made mistake. But current behaviour is ok for
> me.
> > > This
> > > > > timeout cannot be change on server side runtime. But we can
> simplify
> > > > > protocol just use one opcode and message
> > > > >
> > > > > вт, 15 февр. 2022 г., 14:54 Ivan Daschinsky <iv...@gmail.com>:
> > > > >
> > > > > > > Idle timeout can't change, why send it back with every
> heartbeat
> > > > > > response?
> > > > > > May be I am wrong, but from code I see this behaviour. But if I
> am
> > > > wrong,
> > > > > > this is ok behaviour for me.
> > > > > >
> > > > > >
> > > > > >
> > > > > > вт, 15 февр. 2022 г. в 14:00, Pavel Tupitsyn <
> ptupitsyn@apache.org
> > >:
> > > > > >
> > > > > >> Ivan, I mostly agree with your proposal, except this point:
> > > > > >>
> > > > > >> > Response to heartbeat request -- is idle timeout
> > > > > >> Idle timeout can't change, why send it back with every heartbeat
> > > > > response?
> > > > > >>
> > > > > >> > possible cases with cluster restart, upgrade
> > > > > >> In those cases, a new connection will be established, and we'll
> > > > retrieve
> > > > > >> the new timeout after the handshake.
> > > > > >>
> > > > > >>
> > > > > >> On Tue, Feb 15, 2022 at 12:04 PM Maksim Timonin <
> > > > > timoninmaxim@apache.org>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hi Ivan,
> > > > > >> >
> > > > > >> > Cases you described sound reasonable to me. Then the client
> > should
> > > > > just
> > > > > >> set
> > > > > >> > up the `keepAlive` flag, and it just works.
> > > > > >> >
> > > > > >> > So, there are 3 branches:
> > > > > >> > 1. Users don't configure keepAlive at all.
> > > > > >> > 2. Users configure keepAliveHeartbeatInterval (long, ms).
> > > > > >> > 3. Users configure keepAlive (boolean).
> > > > > >> >
> > > > > >> > AFAIU, Pavel's proposal is about covering the second case
> only.
> > > But
> > > > > >> > actually the 2nd and 3rd aren't conflicted with each other.I
> > think
> > > > for
> > > > > >> both
> > > > > >> > branches, a cluster should respond with idleTimeout value on
> > every
> > > > > keep
> > > > > >> > alive client request. Because there are possible cases with
> > > cluster
> > > > > >> > restart, upgrade, etc. Clients should check every response and
> > in
> > > > case
> > > > > >> of
> > > > > >> > changed idleTimeout. For 2nd case write a WARN message, and
> for
> > > 3rd
> > > > -
> > > > > >> > reconfigure themself in case of changed idleTimeout.
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > On Tue, Feb 15, 2022 at 9:51 AM Ivan Daschinsky <
> > > > ivandasch@gmail.com>
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> > > Regarding discussion here [1]
> > > > > >> > >
> > > > > >> > > I suppose that this feature, despite the fact that initial
> > > > intention
> > > > > >> of
> > > > > >> > > Pavel was different, can drastically
> > > > > >> > > improve the usage pattern of thin clients and give a lot of
> > > > > >> opportunities
> > > > > >> > > if the following is done:
> > > > > >> > >
> > > > > >> > > 1. GridNioServer has a great feature -- idle timeout. If  a
> > > server
> > > > > did
> > > > > >> > not
> > > > > >> > > receive any from a client -- it will be kicked off.
> > > > > >> > >     But there are some scenarios that make the use of this
> > > feature
> > > > > >> > > impossible:
> > > > > >> > > a. Multiple workers waiting for batch tasks and relatively
> low
> > > > > >> requests
> > > > > >> > > rate -- this services will be often kicked off and must
> > > reconnect.
> > > > > >> > > In order to prevent this behaviour, the user must implement
> a
> > > kind
> > > > > of
> > > > > >> > > heartbeating by himself.
> > > > > >> > > b. Quite often user may want to implement leader-follower
> > > pattern
> > > > > for
> > > > > >> > > services for HA, so followers also will be considered as
> idle.
> > > > > Kicking
> > > > > >> > off
> > > > > >> > > these followers
> > > > > >> > > is not acceptable, so user  should also implement
> heartbeating
> > > by
> > > > > >> > himself.
> > > > > >> > >
> > > > > >> > > My proposition is:
> > > > > >> > > 1. Add two flags -- enable/disable heartbeats, and very
> > optional
> > > > > >> > heartbeat
> > > > > >> > > timeout. Set enable to true by default, timeout to default
> > > > heartbeat
> > > > > >> > > timeout.
> > > > > >> > > 2. If server and client both support this feature, and
> > > heartbeats
> > > > > are
> > > > > >> not
> > > > > >> > > explicitly disabled on client side:
> > > > > >> > > 3. Response to heartbeat request -- is idle timeout. If idle
> > > > timeout
> > > > > >> is
> > > > > >> > set
> > > > > >> > > on the server side , set heartbeat timeout to one-third of
> it,
> > > > > instead
> > > > > >> > set
> > > > > >> > > to default or specified value.
> > > > > >> > >
> > > > > >> > > Pros:
> > > > > >> > > 1. Easy to set up -- just flag on client side and just set
> > > timeout
> > > > > on
> > > > > >> > > server side.
> > > > > >> > > 2. Hard to configure improperly, i.e set heartbeat timeout
> not
> > > > short
> > > > > >> > enough
> > > > > >> > > in order to prevent kicking out by server.
> > > > > >> > > 3. If the user just wants heartbeats without setting idle
> > > timeout
> > > > --
> > > > > >> > > heartbeats are by default on and with reasonable timeout.
> > > > > >> > >
> > > > > >> > > Cons:
> > > > > >> > > 1. If someone will rely on old behavior and just wants to
> drop
> > > his
> > > > > >> > clients
> > > > > >> > > on timeout -- this will not work without reconfiguring, he
> > > should
> > > > > >> disable
> > > > > >> > > heartbeats.
> > > > > >> > > But I cannot even imagine that someone will find this
> > behaviour
> > > > > >> > desirable.
> > > > > >> > > I strongly believe that this behaviour prevents users from
> > using
> > > > > >> > > idleTimeout on server side.
> > > > > >> > >
> > > > > >> > > [1] --
> > > > > >>
> https://github.com/apache/ignite/pull/9817#discussion_r805628955
> > > > > >> > >
> > > > > >> > > пт, 11 февр. 2022 г. в 10:58, Pavel Tupitsyn <
> > > > ptupitsyn@apache.org
> > > > > >:
> > > > > >> > >
> > > > > >> > > > I've prepared a PR, please have a look:
> > > > > >> > > > https://github.com/apache/ignite/pull/9817
> > > > > >> > > >
> > > > > >> > > > On Mon, Feb 7, 2022 at 6:37 PM Ivan Daschinsky <
> > > > > ivandasch@gmail.com
> > > > > >> >
> > > > > >> > > > wrote:
> > > > > >> > > >
> > > > > >> > > > > I see potential in this feature, especially if we use
> > > > something
> > > > > >> like
> > > > > >> > > > > continuous query. Stale clients can consume a lot of
> > > resources
> > > > > >> and it
> > > > > >> > > is
> > > > > >> > > > > worth kick these clients out.
> > > > > >> > > > >
> > > > > >> > > > > пн, 7 февр. 2022 г. в 18:25, Pavel Tupitsyn <
> > > > > ptupitsyn@apache.org
> > > > > >> >:
> > > > > >> > > > >
> > > > > >> > > > > > > If we use new approach, we can reduce this timeout.
> > But
> > > > this
> > > > > >> can
> > > > > >> > > > affect
> > > > > >> > > > > > old clients.
> > > > > >> > > > > >
> > > > > >> > > > > > idleTimeout is disabled by default, we are not going
> to
> > > > change
> > > > > >> > this.
> > > > > >> > > > > >
> > > > > >> > > > > > > Also, let's think about that sending heartbeats and
> > > > interval
> > > > > >> of
> > > > > >> > > > sending
> > > > > >> > > > > > > heartbeats could be calculated on the server side
> > (i.e.
> > > > one
> > > > > >> third
> > > > > >> > > of
> > > > > >> > > > > idle
> > > > > >> > > > > > > timeout) and sent to the client during handshake.
> > > > > >> > > > > > > Also we can introduce something like a negotiation
> > > > mechanism
> > > > > >> as
> > > > > >> > in
> > > > > >> > > > > > > zookeeper.
> > > > > >> > > > > >
> > > > > >> > > > > > I tend to agree with Maksim here, let's keep it simple
> > and
> > > > > >> > explicit.
> > > > > >> > > > > > Log a warning, but don't do anything clever.
> > > > > >> > > > > >
> > > > > >> > > > > > On Mon, Feb 7, 2022 at 6:15 PM Ivan Daschinsky <
> > > > > >> > ivandasch@gmail.com>
> > > > > >> > > > > > wrote:
> > > > > >> > > > > >
> > > > > >> > > > > > > >> idleTimeout already exists, I don't think we
> should
> > > > > change
> > > > > >> the
> > > > > >> > > way
> > > > > >> > > > > it
> > > > > >> > > > > > > works (or did I misunderstand you?)
> > > > > >> > > > > > > If we use new approach, we can reduce this timeout.
> > But
> > > > this
> > > > > >> can
> > > > > >> > > > affect
> > > > > >> > > > > > old
> > > > > >> > > > > > > clients.
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > > Also, let's think about that sending heartbeats and
> > > > interval
> > > > > >> of
> > > > > >> > > > sending
> > > > > >> > > > > > > heartbeats could be calculated on the server side
> > (i.e.
> > > > one
> > > > > >> third
> > > > > >> > > of
> > > > > >> > > > > idle
> > > > > >> > > > > > > timeout) and sent to the client
> > > > > >> > > > > > > during handshake.
> > > > > >> > > > > > > Also we can introduce something like a negotiation
> > > > mechanism
> > > > > >> as
> > > > > >> > in
> > > > > >> > > > > > > zookeeper.
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > > пн, 7 февр. 2022 г. в 18:05, Pavel Tupitsyn <
> > > > > >> > ptupitsyn@apache.org
> > > > > >> > > >:
> > > > > >> > > > > > >
> > > > > >> > > > > > > > Igor,
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > > Maybe clients should pass this information on to
> > the
> > > > > >> > handshake.
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Do you think we should log a mismatched timeout
> > > warning
> > > > on
> > > > > >> the
> > > > > >> > > > > server,
> > > > > >> > > > > > > not
> > > > > >> > > > > > > > on the client?
> > > > > >> > > > > > > > Or should we do both?
> > > > > >> > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > I've updated the proposal with OP_GET_IDLE_TIMEOUT
> > and
> > > > > some
> > > > > >> > other
> > > > > >> > > > > > details
> > > > > >> > > > > > > > discussed above.
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > On Mon, Feb 7, 2022 at 5:42 PM Igor Sapego <
> > > > > >> isapego@apache.org
> > > > > >> > >
> > > > > >> > > > > wrote:
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > > Feature seems useful for me as it makes
> connection
> > > > > >> management
> > > > > >> > > > more
> > > > > >> > > > > > > robust
> > > > > >> > > > > > > > > and
> > > > > >> > > > > > > > > predictable.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > I agree with Pavel, that we should print warning
> > > when
> > > > > >> > heartbeat
> > > > > >> > > > > > period
> > > > > >> > > > > > > is
> > > > > >> > > > > > > > > larger than
> > > > > >> > > > > > > > > idle timeout, but I see a problem here as idle
> > > timeout
> > > > > is
> > > > > >> > > > > configured
> > > > > >> > > > > > on
> > > > > >> > > > > > > > > server and is not
> > > > > >> > > > > > > > > known to clients, while heartbeats configured on
> > > > clients
> > > > > >> and
> > > > > >> > > > their
> > > > > >> > > > > > > period
> > > > > >> > > > > > > > > is not known
> > > > > >> > > > > > > > > to the server. Maybe clients should pass this
> > > > > information
> > > > > >> on
> > > > > >> > to
> > > > > >> > > > the
> > > > > >> > > > > > > > > handshake.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Regarding Python and PHP clients - can not we
> use
> > > some
> > > > > >> kind
> > > > > >> > of
> > > > > >> > > > > timers
> > > > > >> > > > > > > to
> > > > > >> > > > > > > > > implement
> > > > > >> > > > > > > > > this feature?
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Best Regards,
> > > > > >> > > > > > > > > Igor
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <
> > > > > >> > > > > ptupitsyn@apache.org>
> > > > > >> > > > > > > > > wrote:
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > Maksim, agree. Let's not be too clever and
> only
> > > log
> > > > a
> > > > > >> > > warning.
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn
> <
> > > > > >> > > > > > ptupitsyn@apache.org>
> > > > > >> > > > > > > > > > wrote:
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > > Ivan, idleTimeout already exists, I don't
> > think
> > > we
> > > > > >> should
> > > > > >> > > > > change
> > > > > >> > > > > > > the
> > > > > >> > > > > > > > > way
> > > > > >> > > > > > > > > > > it works (or did I misunderstand you?)
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > Of course, enabling heartbeats means that
> > > > otherwise
> > > > > >> idle
> > > > > >> > > > > clients
> > > > > >> > > > > > > will
> > > > > >> > > > > > > > > no
> > > > > >> > > > > > > > > > > longer be disconnected by the server.
> > > > > >> > > > > > > > > > > I think we should cross-link those
> properties
> > in
> > > > the
> > > > > >> > > > > > documentation
> > > > > >> > > > > > > > and
> > > > > >> > > > > > > > > > > explain this behavior.
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > On Mon, Feb 7, 2022 at 4:39 PM Ivan
> > Daschinsky <
> > > > > >> > > > > > > ivandasch@gmail.com>
> > > > > >> > > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > >> >>3. Already implemented: when
> > > > > >> > > > > > > > > ClientConnectorConfiguration#idleTimeout
> > > > > >> > > > > > > > > > is
> > > > > >> > > > > > > > > > >> not zero, server disconnects idle clients
> > > > > >> > > > > > > > > > >> >>
> > > > > >> > > > > > > > > > >> But I suppose it would be great to have:
> > > > > >> > > > > > > > > > >> 1. If client supports keep alive, use
> > > idleTimeout
> > > > > >> > > > > > > > > > >> 2. If not, do not use it.
> > > > > >> > > > > > > > > > >>
> > > > > >> > > > > > > > > > >> But I am not sure if it is correct or not.
> > > > > >> > > > > > > > > > >>
> > > > > >> > > > > > > > > > >> пн, 7 февр. 2022 г. в 16:01, Maksim
> Timonin <
> > > > > >> > > > > > > > timoninmaxim@apache.org
> > > > > >> > > > > > > > > >:
> > > > > >> > > > > > > > > > >>
> > > > > >> > > > > > > > > > >> > I believe explicit is better than
> implicit
> > :)
> > > > > Also
> > > > > >> in
> > > > > >> > > case
> > > > > >> > > > > of
> > > > > >> > > > > > > > > dynamic
> > > > > >> > > > > > > > > > >> > calculation of timeout, it can change
> > > > > dynamically,
> > > > > >> for
> > > > > >> > > > > example
> > > > > >> > > > > > > > > > >> restarting a
> > > > > >> > > > > > > > > > >> > cluster with different configuration
> should
> > > > > >> > reconfigure
> > > > > >> > > > > > clients
> > > > > >> > > > > > > > too.
> > > > > >> > > > > > > > > > >> Looks
> > > > > >> > > > > > > > > > >> > complicated.
> > > > > >> > > > > > > > > > >> >
> > > > > >> > > > > > > > > > >> > My vote for WARN + javadocs with mention
> of
> > > > this
> > > > > >> > issue.
> > > > > >> > > > > > > > > > >> >
> > > > > >> > > > > > > > > > >> > On Mon, Feb 7, 2022 at 3:51 PM Pavel
> > > Tupitsyn <
> > > > > >> > > > > > > > ptupitsyn@apache.org
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > >> > wrote:
> > > > > >> > > > > > > > > > >> >
> > > > > >> > > > > > > > > > >> > > > WDYT, should we add a WARN message
> for
> > > > > clients
> > > > > >> > that
> > > > > >> > > > > > > configure
> > > > > >> > > > > > > > > > >> > > > keepAliveTimeout greater than
> > idleTimeout
> > > > on
> > > > > >> the
> > > > > >> > > > server
> > > > > >> > > > > > > side?
> > > > > >> > > > > > > > > > >> > >
> > > > > >> > > > > > > > > > >> > > I think we should either log a WARN, or
> > > > > retrieve
> > > > > >> > > > > idleTimeout
> > > > > >> > > > > > > > from
> > > > > >> > > > > > > > > > >> server
> > > > > >> > > > > > > > > > >> > > and configure heartbeatTimeout
> > accordingly
> > > > > (e.g.
> > > > > >> > > divide
> > > > > >> > > > by
> > > > > >> > > > > > 2).
> > > > > >> > > > > > > > > > >> > > Thoughts?
> > > > > >> > > > > > > > > > >> > >
> > > > > >> > > > > > > > > > >> > > On Mon, Feb 7, 2022 at 3:14 PM Maksim
> > > > Timonin <
> > > > > >> > > > > > > > > > >> timoninmaxim@apache.org>
> > > > > >> > > > > > > > > > >> > > wrote:
> > > > > >> > > > > > > > > > >> > >
> > > > > >> > > > > > > > > > >> > > > Hi Pavel,
> > > > > >> > > > > > > > > > >> > > >
> > > > > >> > > > > > > > > > >> > > > Thanks for the links. Yes, I forgot
> > that
> > > > the
> > > > > >> flag
> > > > > >> > of
> > > > > >> > > > > > changed
> > > > > >> > > > > > > > > > >> topology
> > > > > >> > > > > > > > > > >> > is
> > > > > >> > > > > > > > > > >> > > > lazy. Also I missed that the
> keepAlive
> > > > > setting
> > > > > >> is
> > > > > >> > > > > > configured
> > > > > >> > > > > > > > on
> > > > > >> > > > > > > > > > the
> > > > > >> > > > > > > > > > >> > > client
> > > > > >> > > > > > > > > > >> > > > side (alternatively to idleTimeout
> that
> > > is
> > > > on
> > > > > >> the
> > > > > >> > > > server
> > > > > >> > > > > > > > side).
> > > > > >> > > > > > > > > > >> > > >
> > > > > >> > > > > > > > > > >> > > > Now I understand, this feature can be
> > > > helpful
> > > > > >> > then.
> > > > > >> > > > > Every
> > > > > >> > > > > > > > client
> > > > > >> > > > > > > > > > can
> > > > > >> > > > > > > > > > >> > > > configure itself in case it's
> possible
> > to
> > > > be
> > > > > >> idle
> > > > > >> > > > > > sometimes,
> > > > > >> > > > > > > > and
> > > > > >> > > > > > > > > > >> choose
> > > > > >> > > > > > > > > > >> > > > an appropriate timeout by itself too.
> > And
> > > > by
> > > > > >> > default
> > > > > >> > > > the
> > > > > >> > > > > > > > feature
> > > > > >> > > > > > > > > > >> should
> > > > > >> > > > > > > > > > >> > > be
> > > > > >> > > > > > > > > > >> > > > disabled.
> > > > > >> > > > > > > > > > >> > > >
> > > > > >> > > > > > > > > > >> > > > WDYT, should we add a WARN message
> for
> > > > > clients
> > > > > >> > that
> > > > > >> > > > > > > configure
> > > > > >> > > > > > > > > > >> > > > keepAliveTimeout greater than
> > idleTimeout
> > > > on
> > > > > >> the
> > > > > >> > > > server
> > > > > >> > > > > > > side?
> > > > > >> > > > > > > > > > >> > > >
> > > > > >> > > > > > > > > > >> > > >
> > > > > >> > > > > > > > > > >> > > >
> > > > > >> > > > > > > > > > >> > > > On Mon, Feb 7, 2022 at 1:05 PM Pavel
> > > > > Tupitsyn <
> > > > > >> > > > > > > > > > ptupitsyn@apache.org
> > > > > >> > > > > > > > > > >> >
> > > > > >> > > > > > > > > > >> > > > wrote:
> > > > > >> > > > > > > > > > >> > > >
> > > > > >> > > > > > > > > > >> > > > > Ivan,
> > > > > >> > > > > > > > > > >> > > > >
> > > > > >> > > > > > > > > > >> > > > > I suggest the following:
> > > > > >> > > > > > > > > > >> > > > >
> > > > > >> > > > > > > > > > >> > > > > 1. Server sends KEEP_ALIVE feature
> > > flag,
> > > > > >> which
> > > > > >> > > means
> > > > > >> > > > > it
> > > > > >> > > > > > > > > accepts
> > > > > >> > > > > > > > > > >> > > > > OP_KEEP_ALIVE empty message
> > > > > >> > > > > > > > > > >> > > > > 2. Client sends OP_KEEP_ALIVE when
> > the
> > > > > >> > connection
> > > > > >> > > is
> > > > > >> > > > > > idle
> > > > > >> > > > > > > > for
> > > > > >> > > > > > > > > a
> > > > > >> > > > > > > > > > >> > > > > certain period of time
> > > > > >> > > > > > > > > > >> > > > > 3. Already implemented: when
> > > > > >> > > > > > > > > > >> ClientConnectorConfiguration#idleTimeout
> > > > > >> > > > > > > > > > >> > > is
> > > > > >> > > > > > > > > > >> > > > > not zero, server disconnects idle
> > > clients
> > > > > >> > > > > > > > > > >> > > > >
> > > > > >> > > > > > > > > > >> > > > > This way we don't need
> server->client
> > > > > >> > keepalives,
> > > > > >> > > as
> > > > > >> > > > > you
> > > > > >> > > > > > > > > > correctly
> > > > > >> > > > > > > > > > >> > > noted.
> > > > > >> > > > > > > > > > >> > > > >
> > > > > >> > > > > > > > > > >> > > > > On Mon, Feb 7, 2022 at 12:43 PM
> Ivan
> > > > > >> Daschinsky
> > > > > >> > <
> > > > > >> > > > > > > > > > >> ivandasch@gmail.com
> > > > > >> > > > > > > > > > >> > >
> > > > > >> > > > > > > > > > >> > > > > wrote:
> > > > > >> > > > > > > > > > >> > > > >
> > > > > >> > > > > > > > > > >> > > > > > Pavel, I suppose that ideally:
> > > > > >> > > > > > > > > > >> > > > > > 1. Client send in handshake flag,
> > > that
> > > > it
> > > > > >> > > supports
> > > > > >> > > > > > > > > KEEP_ALIVE
> > > > > >> > > > > > > > > > >> > feature
> > > > > >> > > > > > > > > > >> > > > and
> > > > > >> > > > > > > > > > >> > > > > > server takes it into account.
> > > > > >> > > > > > > > > > >> > > > > > 2. Each request of client can be
> > > > > >> considered as
> > > > > >> > > > > > > keep-alive
> > > > > >> > > > > > > > > > ping.
> > > > > >> > > > > > > > > > >> > > > > > 3. Client send failure should be
> > > > > processed
> > > > > >> > using
> > > > > >> > > > > retry
> > > > > >> > > > > > > > > policy.
> > > > > >> > > > > > > > > > >> > > > > > 4. Server should not send
> > keep-alive
> > > > > >> packets,
> > > > > >> > it
> > > > > >> > > > is
> > > > > >> > > > > > > > > redundant,
> > > > > >> > > > > > > > > > >> but
> > > > > >> > > > > > > > > > >> > > > server
> > > > > >> > > > > > > > > > >> > > > > > should track requests from client
> > and
> > > > if
> > > > > >> there
> > > > > >> > > is
> > > > > >> > > > no
> > > > > >> > > > > > > > > requests
> > > > > >> > > > > > > > > > >> from
> > > > > >> > > > > > > > > > >> > > > client
> > > > > >> > > > > > > > > > >> > > > > > with KEEP_ALIVE feature,
> > > > > >> > > > > > > > > > >> > > > > > automatically close connection
> and
> > > free
> > > > > >> > > resources.
> > > > > >> > > > > > > > > > >> > > > > >
> > > > > >> > > > > > > > > > >> > > > > > Similar approach is used in
> > zookeeper
> > > > > >> clients.
> > > > > >> > > > > > > > > > >> > > > > >
> > > > > >> > > > > > > > > > >> > > > > > пн, 7 февр. 2022 г. в 12:24,
> Pavel
> > > > > >> Tupitsyn <
> > > > > >> > > > > > > > > > >> ptupitsyn@apache.org
> > > > > >> > > > > > > > > > >> > >:
> > > > > >> > > > > > > > > > >> > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > Ivan,
> > > > > >> > > > > > > > > > >> > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > Ideally, the check should come
> > from
> > > > > both
> > > > > >> > > sides.
> > > > > >> > > > > > > > > > >> > > > > > > - Client periodically sends
> > > keepalive
> > > > > to
> > > > > >> > > server
> > > > > >> > > > > > > > > > >> > > > > > > - Server periodically sends
> > > keepalive
> > > > > to
> > > > > >> > > client
> > > > > >> > > > > > > > > > >> > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > Feature flags will be added
> > > > > accordingly,
> > > > > >> so
> > > > > >> > it
> > > > > >> > > > is
> > > > > >> > > > > > not
> > > > > >> > > > > > > > > > >> necessary
> > > > > >> > > > > > > > > > >> > to
> > > > > >> > > > > > > > > > >> > > > > > > implement this in all thin
> > clients.
> > > > > >> > > > > > > > > > >> > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > On Mon, Feb 7, 2022 at 11:43 AM
> > > Ivan
> > > > > >> > > Daschinsky
> > > > > >> > > > <
> > > > > >> > > > > > > > > > >> > > ivandasch@gmail.com
> > > > > >> > > > > > > > > > >> > > > >
> > > > > >> > > > > > > > > > >> > > > > > > wrote:
> > > > > >> > > > > > > > > > >> > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > I suppose it is great idea,
> but
> > > > this
> > > > > >> > > > > functionality
> > > > > >> > > > > > > can
> > > > > >> > > > > > > > > be
> > > > > >> > > > > > > > > > >> hard
> > > > > >> > > > > > > > > > >> > to
> > > > > >> > > > > > > > > > >> > > > > > > implement
> > > > > >> > > > > > > > > > >> > > > > > > > for some platforms. I.e. sync
> > > > python
> > > > > >> > client
> > > > > >> > > or
> > > > > >> > > > > php
> > > > > >> > > > > > > > > (there
> > > > > >> > > > > > > > > > >> is no
> > > > > >> > > > > > > > > > >> > > > real
> > > > > >> > > > > > > > > > >> > > > > > > > multithreading for python
> (GIL)
> > > and
> > > > > >> php is
> > > > > >> > > > > single
> > > > > >> > > > > > > > > threaded
> > > > > >> > > > > > > > > > >> by
> > > > > >> > > > > > > > > > >> > > > > design).
> > > > > >> > > > > > > > > > >> > > > > > > But
> > > > > >> > > > > > > > > > >> > > > > > > > for async clients it is not
> > very
> > > > hard
> > > > > >> to
> > > > > >> > > > > > implement.
> > > > > >> > > > > > > > > > >> > Nevertheless,
> > > > > >> > > > > > > > > > >> > > > > this
> > > > > >> > > > > > > > > > >> > > > > > > > feature should be optional,
> > > because
> > > > > of
> > > > > >> > > > possible
> > > > > >> > > > > > > > > technical
> > > > > >> > > > > > > > > > >> > > > > limitations.
> > > > > >> > > > > > > > > > >> > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > Pavel, is this check mostly
> for
> > > > > client
> > > > > >> > side?
> > > > > >> > > > Or
> > > > > >> > > > > > > > servers
> > > > > >> > > > > > > > > > can
> > > > > >> > > > > > > > > > >> do
> > > > > >> > > > > > > > > > >> > > some
> > > > > >> > > > > > > > > > >> > > > > > > actions
> > > > > >> > > > > > > > > > >> > > > > > > > if there is no activity from
> > thin
> > > > > >> client
> > > > > >> > > (i.e.
> > > > > >> > > > > > > closing
> > > > > >> > > > > > > > > > >> context
> > > > > >> > > > > > > > > > >> > > and
> > > > > >> > > > > > > > > > >> > > > > free
> > > > > >> > > > > > > > > > >> > > > > > > > resources such as queries'
> > > handles
> > > > > and
> > > > > >> so
> > > > > >> > > on?)
> > > > > >> > > > > > > > > > >> > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > пн, 7 февр. 2022 г. в 11:09,
> > > Pavel
> > > > > >> > Tupitsyn
> > > > > >> > > <
> > > > > >> > > > > > > > > > >> > > ptupitsyn@apache.org
> > > > > >> > > > > > > > > > >> > > > >:
> > > > > >> > > > > > > > > > >> > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > Hi Maksim,
> > > > > >> > > > > > > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > > half-state is a possible
> > > > > situation
> > > > > >> > when
> > > > > >> > > an
> > > > > >> > > > > > > Ignite
> > > > > >> > > > > > > > > node
> > > > > >> > > > > > > > > > >> goes
> > > > > >> > > > > > > > > > >> > > > down
> > > > > >> > > > > > > > > > >> > > > > or
> > > > > >> > > > > > > > > > >> > > > > > > > > somehow removes connection
> > to a
> > > > > thin
> > > > > >> > > client
> > > > > >> > > > > > > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > Half-open state is also
> > > possible
> > > > > >> when,
> > > > > >> > for
> > > > > >> > > > > > > example,
> > > > > >> > > > > > > > an
> > > > > >> > > > > > > > > > >> > > > intermediate
> > > > > >> > > > > > > > > > >> > > > > > > > router
> > > > > >> > > > > > > > > > >> > > > > > > > > is rebooted [1].
> > > > > >> > > > > > > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > This is what we seem to
> have
> > > > > >> encountered
> > > > > >> > > > with
> > > > > >> > > > > > one
> > > > > >> > > > > > > of
> > > > > >> > > > > > > > > our
> > > > > >> > > > > > > > > > >> > > > customers
> > > > > >> > > > > > > > > > >> > > > > -
> > > > > >> > > > > > > > > > >> > > > > > > they
> > > > > >> > > > > > > > > > >> > > > > > > > > have a stable cluster, and
> > > > > >> long-living
> > > > > >> > > > > (multiple
> > > > > >> > > > > > > > days)
> > > > > >> > > > > > > > > > >> thin
> > > > > >> > > > > > > > > > >> > > > client
> > > > > >> > > > > > > > > > >> > > > > > > > > connections which can be
> idle
> > > for
> > > > > >> some
> > > > > >> > > time.
> > > > > >> > > > > > > > > > >> > > > > > > > > And only when we send some
> > data
> > > > on
> > > > > >> such
> > > > > >> > an
> > > > > >> > > > > idle
> > > > > >> > > > > > > > > > >> connection do
> > > > > >> > > > > > > > > > >> > > we
> > > > > >> > > > > > > > > > >> > > > > > > discover
> > > > > >> > > > > > > > > > >> > > > > > > > > that it is broken.
> > > > > >> > > > > > > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > > But with enabled (true by
> > > > > default)
> > > > > >> > > > > > > > > partitionAwareness
> > > > > >> > > > > > > > > > >> > feature
> > > > > >> > > > > > > > > > >> > > > > > clients
> > > > > >> > > > > > > > > > >> > > > > > > > can
> > > > > >> > > > > > > > > > >> > > > > > > > > be notified about topology
> > > > changes
> > > > > >> > > > > > > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > Partition awareness is a
> > "lazy"
> > > > > >> > > notification
> > > > > >> > > > > in
> > > > > >> > > > > > a
> > > > > >> > > > > > > > form
> > > > > >> > > > > > > > > > of
> > > > > >> > > > > > > > > > >> a
> > > > > >> > > > > > > > > > >> > > > > response
> > > > > >> > > > > > > > > > >> > > > > > > > > message flag [2].
> > > > > >> > > > > > > > > > >> > > > > > > > > You won't get one on an
> idle
> > > > > >> connection.
> > > > > >> > > > > > > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > > the connections are
> removed
> > > on
> > > > > the
> > > > > >> > > server
> > > > > >> > > > > side
> > > > > >> > > > > > > by
> > > > > >> > > > > > > > > > client
> > > > > >> > > > > > > > > > >> > idle
> > > > > >> > > > > > > > > > >> > > > > > timeout
> > > > > >> > > > > > > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > Idle timeout is disabled by
> > > > > default.
> > > > > >> > > > > > > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > > is it OK to keep such
> > > > connections
> > > > > >> > alive
> > > > > >> > > > for
> > > > > >> > > > > a
> > > > > >> > > > > > > long
> > > > > >> > > > > > > > > > time
> > > > > >> > > > > > > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > I think it is up to the
> user.
> > > > > >> > > > > > > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > > in the case of partition
> > > > > awareness
> > > > > >> > > > features
> > > > > >> > > > > it
> > > > > >> > > > > > > can
> > > > > >> > > > > > > > > > lead
> > > > > >> > > > > > > > > > >> to
> > > > > >> > > > > > > > > > >> > > > > wasting
> > > > > >> > > > > > > > > > >> > > > > > > TCP
> > > > > >> > > > > > > > > > >> > > > > > > > > sockets on Ignite nodes,
> > can't
> > > it
> > > > > >> > > > > > > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > Can you please elaborate?
> > > > > >> > > > > > > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > [1]
> > > > > >> > > > > > > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > >
> > > > > >> > > > > > > > > > >> > > > > >
> > > > > >> > > > > > > > > > >> > > > >
> > > > > >> > > > > > > > > > >> > > >
> > > > > >> > > > > > > > > > >> > >
> > > > > >> > > > > > > > > > >> >
> > > > > >> > > > > > > > > > >>
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
> > > > > >> > > > > > > > > > >> > > > > > > > > [2]
> > > > > >> > > > > > > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > >
> > > > > >> > > > > > > > > > >> > > > > >
> > > > > >> > > > > > > > > > >> > > > >
> > > > > >> > > > > > > > > > >> > > >
> > > > > >> > > > > > > > > > >> > >
> > > > > >> > > > > > > > > > >> >
> > > > > >> > > > > > > > > > >>
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
> > > > > >> > > > > > > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > On Fri, Feb 4, 2022 at 4:01
> > PM
> > > > > Maksim
> > > > > >> > > > Timonin
> > > > > >> > > > > <
> > > > > >> > > > > > > > > > >> > > > > > timoninmaxim@apache.org
> > > > > >> > > > > > > > > > >> > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > wrote:
> > > > > >> > > > > > > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > > Hi Pavel,
> > > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > > Thanks for starting this
> > > > thread!
> > > > > >> Can I
> > > > > >> > > ask
> > > > > >> > > > > > some
> > > > > >> > > > > > > > > > >> questions
> > > > > >> > > > > > > > > > >> > > here
> > > > > >> > > > > > > > > > >> > > > to
> > > > > >> > > > > > > > > > >> > > > > > get
> > > > > >> > > > > > > > > > >> > > > > > > > the
> > > > > >> > > > > > > > > > >> > > > > > > > > > feature more clearly?
> > > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > > As I understand it
> > correctly,
> > > > > >> > half-state
> > > > > >> > > > is
> > > > > >> > > > > a
> > > > > >> > > > > > > > > possible
> > > > > >> > > > > > > > > > >> > > > situation
> > > > > >> > > > > > > > > > >> > > > > > when
> > > > > >> > > > > > > > > > >> > > > > > > > an
> > > > > >> > > > > > > > > > >> > > > > > > > > > Ignite node goes down or
> > > > somehow
> > > > > >> > removes
> > > > > >> > > > > > > > connection
> > > > > >> > > > > > > > > > to a
> > > > > >> > > > > > > > > > >> > thin
> > > > > >> > > > > > > > > > >> > > > > > client.
> > > > > >> > > > > > > > > > >> > > > > > > > But
> > > > > >> > > > > > > > > > >> > > > > > > > > > with enabled (true by
> > > default)
> > > > > >> > > > > > > partitionAwareness
> > > > > >> > > > > > > > > > >> feature
> > > > > >> > > > > > > > > > >> > > > clients
> > > > > >> > > > > > > > > > >> > > > > > can
> > > > > >> > > > > > > > > > >> > > > > > > > be
> > > > > >> > > > > > > > > > >> > > > > > > > > > notified about topology
> > > > changes.
> > > > > >> So,
> > > > > >> > > there
> > > > > >> > > > > are
> > > > > >> > > > > > > > > > possible
> > > > > >> > > > > > > > > > >> > > cases:
> > > > > >> > > > > > > > > > >> > > > > > > > > > 1. ThinClient connects
> to a
> > > > > single
> > > > > >> > node.
> > > > > >> > > > > > > > > > >> > > > > > > > > > 2. Ignite node removes
> > > > connection
> > > > > >> from
> > > > > >> > > > > itself.
> > > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > > I like the idea for the
> > case
> > > > > with a
> > > > > >> > > single
> > > > > >> > > > > > node,
> > > > > >> > > > > > > > as
> > > > > >> > > > > > > > > it
> > > > > >> > > > > > > > > > >> > helps
> > > > > >> > > > > > > > > > >> > > > fail
> > > > > >> > > > > > > > > > >> > > > > > > fast.
> > > > > >> > > > > > > > > > >> > > > > > > > > > But is it OK to connect a
> > > > client
> > > > > >> to a
> > > > > >> > > > single
> > > > > >> > > > > > > node
> > > > > >> > > > > > > > > > only?
> > > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > > For the second one: you
> > > mention
> > > > > >> that a
> > > > > >> > > > case
> > > > > >> > > > > > for
> > > > > >> > > > > > > > the
> > > > > >> > > > > > > > > > >> second
> > > > > >> > > > > > > > > > >> > > > option
> > > > > >> > > > > > > > > > >> > > > > > is
> > > > > >> > > > > > > > > > >> > > > > > > > > > "Long-living and mostly
> > idle
> > > > > >> > connections
> > > > > >> > > > are
> > > > > >> > > > > > > > > > especially
> > > > > >> > > > > > > > > > >> > > > > susceptible
> > > > > >> > > > > > > > > > >> > > > > > > to
> > > > > >> > > > > > > > > > >> > > > > > > > > this
> > > > > >> > > > > > > > > > >> > > > > > > > > > behavior". If I
> understand
> > > > > >> correctly
> > > > > >> > the
> > > > > >> > > > > > > > connections
> > > > > >> > > > > > > > > > are
> > > > > >> > > > > > > > > > >> > > > removed
> > > > > >> > > > > > > > > > >> > > > > on
> > > > > >> > > > > > > > > > >> > > > > > > the
> > > > > >> > > > > > > > > > >> > > > > > > > > > server side by client
> idle
> > > > > timeout.
> > > > > >> > Can
> > > > > >> > > we
> > > > > >> > > > > > just
> > > > > >> > > > > > > > > > >> configure
> > > > > >> > > > > > > > > > >> > the
> > > > > >> > > > > > > > > > >> > > > > idle
> > > > > >> > > > > > > > > > >> > > > > > > > > timeout
> > > > > >> > > > > > > > > > >> > > > > > > > > > for cases where we really
> > > need
> > > > > >> keeping
> > > > > >> > > > alive
> > > > > >> > > > > > > idle
> > > > > >> > > > > > > > > > >> > > connections?
> > > > > >> > > > > > > > > > >> > > > > Are
> > > > > >> > > > > > > > > > >> > > > > > > > there
> > > > > >> > > > > > > > > > >> > > > > > > > > > any other cases with
> > > > unexpectedly
> > > > > >> > > dropped
> > > > > >> > > > > > > > > connections?
> > > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > > I'm wondering is it OK to
> > > keep
> > > > > such
> > > > > >> > > > > > connections
> > > > > >> > > > > > > > > alive
> > > > > >> > > > > > > > > > >> for a
> > > > > >> > > > > > > > > > >> > > > long
> > > > > >> > > > > > > > > > >> > > > > > > time?
> > > > > >> > > > > > > > > > >> > > > > > > > > > Also in the case of
> > partition
> > > > > >> > awareness
> > > > > >> > > > > > features
> > > > > >> > > > > > > > it
> > > > > >> > > > > > > > > > can
> > > > > >> > > > > > > > > > >> > lead
> > > > > >> > > > > > > > > > >> > > to
> > > > > >> > > > > > > > > > >> > > > > > > wasting
> > > > > >> > > > > > > > > > >> > > > > > > > > TCP
> > > > > >> > > > > > > > > > >> > > > > > > > > > sockets on Ignite nodes,
> > > can't
> > > > > it?
> > > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > > Thanks!
> > > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > > On Thu, Feb 3, 2022 at
> 2:24
> > > PM
> > > > > >> Pavel
> > > > > >> > > > > Tupitsyn
> > > > > >> > > > > > <
> > > > > >> > > > > > > > > > >> > > > > > ptupitsyn@apache.org>
> > > > > >> > > > > > > > > > >> > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > > >> Igniters,
> > > > > >> > > > > > > > > > >> > > > > > > > > >>
> > > > > >> > > > > > > > > > >> > > > > > > > > >> Please review the
> proposal
> > > to
> > > > > add
> > > > > >> > > > heartbeat
> > > > > >> > > > > > > > > messages
> > > > > >> > > > > > > > > > to
> > > > > >> > > > > > > > > > >> > the
> > > > > >> > > > > > > > > > >> > > > thin
> > > > > >> > > > > > > > > > >> > > > > > > > client
> > > > > >> > > > > > > > > > >> > > > > > > > > >> protocol (both 2.x and
> > 3.x)
> > > > and
> > > > > >> let
> > > > > >> > me
> > > > > >> > > > know
> > > > > >> > > > > > > your
> > > > > >> > > > > > > > > > >> thoughts:
> > > > > >> > > > > > > > > > >> > > > > > > > > >>
> > > > > >> > > > > > > > > > >> > > > > > > > > >>
> > > > > >> > > > > > > > > > >> > > > > > > > > >>
> > > > > >> > > > > > > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > >
> > > > > >> > > > > > > > > > >> > > > > >
> > > > > >> > > > > > > > > > >> > > > >
> > > > > >> > > > > > > > > > >> > > >
> > > > > >> > > > > > > > > > >> > >
> > > > > >> > > > > > > > > > >> >
> > > > > >> > > > > > > > > > >>
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
> > > > > >> > > > > > > > > > >> > > > > > > > > >>
> > > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > > > --
> > > > > >> > > > > > > > > > >> > > > > > > > Sincerely yours, Ivan
> > Daschinskiy
> > > > > >> > > > > > > > > > >> > > > > > > >
> > > > > >> > > > > > > > > > >> > > > > > >
> > > > > >> > > > > > > > > > >> > > > > >
> > > > > >> > > > > > > > > > >> > > > > >
> > > > > >> > > > > > > > > > >> > > > > > --
> > > > > >> > > > > > > > > > >> > > > > > Sincerely yours, Ivan Daschinskiy
> > > > > >> > > > > > > > > > >> > > > > >
> > > > > >> > > > > > > > > > >> > > > >
> > > > > >> > > > > > > > > > >> > > >
> > > > > >> > > > > > > > > > >> > >
> > > > > >> > > > > > > > > > >> >
> > > > > >> > > > > > > > > > >>
> > > > > >> > > > > > > > > > >>
> > > > > >> > > > > > > > > > >> --
> > > > > >> > > > > > > > > > >> Sincerely yours, Ivan Daschinskiy
> > > > > >> > > > > > > > > > >>
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > > --
> > > > > >> > > > > > > Sincerely yours, Ivan Daschinskiy
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > --
> > > > > >> > > > > Sincerely yours, Ivan Daschinskiy
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > --
> > > > > >> > > Sincerely yours, Ivan Daschinskiy
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Sincerely yours, Ivan Daschinskiy
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Ivan Daschinsky <iv...@gmail.com>.
I personally prefer heartbeatInterval

вт, 15 февр. 2022 г., 18:25 Pavel Tupitsyn <pt...@apache.org>:

> > What about "keepAlive", "keepAliveInterval" then? It looks more common
> and matches the IEP title :)
> According to Google, HeartbeatInterval has ~169K results, and
> KeepAliveInterval has ~110K :)
>
> In my experience, both are well understood. I am equally willing to use any
> of them.
> Any other opinions?
>
> On Tue, Feb 15, 2022 at 6:11 PM Maksim Timonin <ti...@apache.org>
> wrote:
>
> > What about "keepAlive", "keepAliveInterval" then? It looks more common
> and
> > matches the IEP title :)
> >
> > On Tue, Feb 15, 2022 at 5:54 PM Pavel Tupitsyn <pt...@apache.org>
> > wrote:
> >
> > > To summarize, we add two properties to the ClientConfiguration:
> > > bool heartbeatsEnabled = true;
> > > long defaultHeartbeatInterval = 60_000; // Default 1 minute, used
> > >
> > > Logic:
> > > if (heartbeatsEnabled) {
> > >   heartbeatInterval = serverIdleTimeout > 0 ? serverIdleTimeout / 3 :
> > > defaultHeartbeatInterval;
> > > }
> > >
> > >
> > > Thoughts, objections?
> > >
> > > On Tue, Feb 15, 2022 at 4:32 PM Ivan Daschinsky <iv...@gmail.com>
> > > wrote:
> > >
> > > > Pavel, sorry, i've made mistake. But current behaviour is ok for me.
> > This
> > > > timeout cannot be change on server side runtime. But we can simplify
> > > > protocol just use one opcode and message
> > > >
> > > > вт, 15 февр. 2022 г., 14:54 Ivan Daschinsky <iv...@gmail.com>:
> > > >
> > > > > > Idle timeout can't change, why send it back with every heartbeat
> > > > > response?
> > > > > May be I am wrong, but from code I see this behaviour. But if I am
> > > wrong,
> > > > > this is ok behaviour for me.
> > > > >
> > > > >
> > > > >
> > > > > вт, 15 февр. 2022 г. в 14:00, Pavel Tupitsyn <ptupitsyn@apache.org
> >:
> > > > >
> > > > >> Ivan, I mostly agree with your proposal, except this point:
> > > > >>
> > > > >> > Response to heartbeat request -- is idle timeout
> > > > >> Idle timeout can't change, why send it back with every heartbeat
> > > > response?
> > > > >>
> > > > >> > possible cases with cluster restart, upgrade
> > > > >> In those cases, a new connection will be established, and we'll
> > > retrieve
> > > > >> the new timeout after the handshake.
> > > > >>
> > > > >>
> > > > >> On Tue, Feb 15, 2022 at 12:04 PM Maksim Timonin <
> > > > timoninmaxim@apache.org>
> > > > >> wrote:
> > > > >>
> > > > >> > Hi Ivan,
> > > > >> >
> > > > >> > Cases you described sound reasonable to me. Then the client
> should
> > > > just
> > > > >> set
> > > > >> > up the `keepAlive` flag, and it just works.
> > > > >> >
> > > > >> > So, there are 3 branches:
> > > > >> > 1. Users don't configure keepAlive at all.
> > > > >> > 2. Users configure keepAliveHeartbeatInterval (long, ms).
> > > > >> > 3. Users configure keepAlive (boolean).
> > > > >> >
> > > > >> > AFAIU, Pavel's proposal is about covering the second case only.
> > But
> > > > >> > actually the 2nd and 3rd aren't conflicted with each other.I
> think
> > > for
> > > > >> both
> > > > >> > branches, a cluster should respond with idleTimeout value on
> every
> > > > keep
> > > > >> > alive client request. Because there are possible cases with
> > cluster
> > > > >> > restart, upgrade, etc. Clients should check every response and
> in
> > > case
> > > > >> of
> > > > >> > changed idleTimeout. For 2nd case write a WARN message, and for
> > 3rd
> > > -
> > > > >> > reconfigure themself in case of changed idleTimeout.
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > On Tue, Feb 15, 2022 at 9:51 AM Ivan Daschinsky <
> > > ivandasch@gmail.com>
> > > > >> > wrote:
> > > > >> >
> > > > >> > > Regarding discussion here [1]
> > > > >> > >
> > > > >> > > I suppose that this feature, despite the fact that initial
> > > intention
> > > > >> of
> > > > >> > > Pavel was different, can drastically
> > > > >> > > improve the usage pattern of thin clients and give a lot of
> > > > >> opportunities
> > > > >> > > if the following is done:
> > > > >> > >
> > > > >> > > 1. GridNioServer has a great feature -- idle timeout. If  a
> > server
> > > > did
> > > > >> > not
> > > > >> > > receive any from a client -- it will be kicked off.
> > > > >> > >     But there are some scenarios that make the use of this
> > feature
> > > > >> > > impossible:
> > > > >> > > a. Multiple workers waiting for batch tasks and relatively low
> > > > >> requests
> > > > >> > > rate -- this services will be often kicked off and must
> > reconnect.
> > > > >> > > In order to prevent this behaviour, the user must implement a
> > kind
> > > > of
> > > > >> > > heartbeating by himself.
> > > > >> > > b. Quite often user may want to implement leader-follower
> > pattern
> > > > for
> > > > >> > > services for HA, so followers also will be considered as idle.
> > > > Kicking
> > > > >> > off
> > > > >> > > these followers
> > > > >> > > is not acceptable, so user  should also implement heartbeating
> > by
> > > > >> > himself.
> > > > >> > >
> > > > >> > > My proposition is:
> > > > >> > > 1. Add two flags -- enable/disable heartbeats, and very
> optional
> > > > >> > heartbeat
> > > > >> > > timeout. Set enable to true by default, timeout to default
> > > heartbeat
> > > > >> > > timeout.
> > > > >> > > 2. If server and client both support this feature, and
> > heartbeats
> > > > are
> > > > >> not
> > > > >> > > explicitly disabled on client side:
> > > > >> > > 3. Response to heartbeat request -- is idle timeout. If idle
> > > timeout
> > > > >> is
> > > > >> > set
> > > > >> > > on the server side , set heartbeat timeout to one-third of it,
> > > > instead
> > > > >> > set
> > > > >> > > to default or specified value.
> > > > >> > >
> > > > >> > > Pros:
> > > > >> > > 1. Easy to set up -- just flag on client side and just set
> > timeout
> > > > on
> > > > >> > > server side.
> > > > >> > > 2. Hard to configure improperly, i.e set heartbeat timeout not
> > > short
> > > > >> > enough
> > > > >> > > in order to prevent kicking out by server.
> > > > >> > > 3. If the user just wants heartbeats without setting idle
> > timeout
> > > --
> > > > >> > > heartbeats are by default on and with reasonable timeout.
> > > > >> > >
> > > > >> > > Cons:
> > > > >> > > 1. If someone will rely on old behavior and just wants to drop
> > his
> > > > >> > clients
> > > > >> > > on timeout -- this will not work without reconfiguring, he
> > should
> > > > >> disable
> > > > >> > > heartbeats.
> > > > >> > > But I cannot even imagine that someone will find this
> behaviour
> > > > >> > desirable.
> > > > >> > > I strongly believe that this behaviour prevents users from
> using
> > > > >> > > idleTimeout on server side.
> > > > >> > >
> > > > >> > > [1] --
> > > > >> https://github.com/apache/ignite/pull/9817#discussion_r805628955
> > > > >> > >
> > > > >> > > пт, 11 февр. 2022 г. в 10:58, Pavel Tupitsyn <
> > > ptupitsyn@apache.org
> > > > >:
> > > > >> > >
> > > > >> > > > I've prepared a PR, please have a look:
> > > > >> > > > https://github.com/apache/ignite/pull/9817
> > > > >> > > >
> > > > >> > > > On Mon, Feb 7, 2022 at 6:37 PM Ivan Daschinsky <
> > > > ivandasch@gmail.com
> > > > >> >
> > > > >> > > > wrote:
> > > > >> > > >
> > > > >> > > > > I see potential in this feature, especially if we use
> > > something
> > > > >> like
> > > > >> > > > > continuous query. Stale clients can consume a lot of
> > resources
> > > > >> and it
> > > > >> > > is
> > > > >> > > > > worth kick these clients out.
> > > > >> > > > >
> > > > >> > > > > пн, 7 февр. 2022 г. в 18:25, Pavel Tupitsyn <
> > > > ptupitsyn@apache.org
> > > > >> >:
> > > > >> > > > >
> > > > >> > > > > > > If we use new approach, we can reduce this timeout.
> But
> > > this
> > > > >> can
> > > > >> > > > affect
> > > > >> > > > > > old clients.
> > > > >> > > > > >
> > > > >> > > > > > idleTimeout is disabled by default, we are not going to
> > > change
> > > > >> > this.
> > > > >> > > > > >
> > > > >> > > > > > > Also, let's think about that sending heartbeats and
> > > interval
> > > > >> of
> > > > >> > > > sending
> > > > >> > > > > > > heartbeats could be calculated on the server side
> (i.e.
> > > one
> > > > >> third
> > > > >> > > of
> > > > >> > > > > idle
> > > > >> > > > > > > timeout) and sent to the client during handshake.
> > > > >> > > > > > > Also we can introduce something like a negotiation
> > > mechanism
> > > > >> as
> > > > >> > in
> > > > >> > > > > > > zookeeper.
> > > > >> > > > > >
> > > > >> > > > > > I tend to agree with Maksim here, let's keep it simple
> and
> > > > >> > explicit.
> > > > >> > > > > > Log a warning, but don't do anything clever.
> > > > >> > > > > >
> > > > >> > > > > > On Mon, Feb 7, 2022 at 6:15 PM Ivan Daschinsky <
> > > > >> > ivandasch@gmail.com>
> > > > >> > > > > > wrote:
> > > > >> > > > > >
> > > > >> > > > > > > >> idleTimeout already exists, I don't think we should
> > > > change
> > > > >> the
> > > > >> > > way
> > > > >> > > > > it
> > > > >> > > > > > > works (or did I misunderstand you?)
> > > > >> > > > > > > If we use new approach, we can reduce this timeout.
> But
> > > this
> > > > >> can
> > > > >> > > > affect
> > > > >> > > > > > old
> > > > >> > > > > > > clients.
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > Also, let's think about that sending heartbeats and
> > > interval
> > > > >> of
> > > > >> > > > sending
> > > > >> > > > > > > heartbeats could be calculated on the server side
> (i.e.
> > > one
> > > > >> third
> > > > >> > > of
> > > > >> > > > > idle
> > > > >> > > > > > > timeout) and sent to the client
> > > > >> > > > > > > during handshake.
> > > > >> > > > > > > Also we can introduce something like a negotiation
> > > mechanism
> > > > >> as
> > > > >> > in
> > > > >> > > > > > > zookeeper.
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > пн, 7 февр. 2022 г. в 18:05, Pavel Tupitsyn <
> > > > >> > ptupitsyn@apache.org
> > > > >> > > >:
> > > > >> > > > > > >
> > > > >> > > > > > > > Igor,
> > > > >> > > > > > > >
> > > > >> > > > > > > > > Maybe clients should pass this information on to
> the
> > > > >> > handshake.
> > > > >> > > > > > > >
> > > > >> > > > > > > > Do you think we should log a mismatched timeout
> > warning
> > > on
> > > > >> the
> > > > >> > > > > server,
> > > > >> > > > > > > not
> > > > >> > > > > > > > on the client?
> > > > >> > > > > > > > Or should we do both?
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > > I've updated the proposal with OP_GET_IDLE_TIMEOUT
> and
> > > > some
> > > > >> > other
> > > > >> > > > > > details
> > > > >> > > > > > > > discussed above.
> > > > >> > > > > > > >
> > > > >> > > > > > > > On Mon, Feb 7, 2022 at 5:42 PM Igor Sapego <
> > > > >> isapego@apache.org
> > > > >> > >
> > > > >> > > > > wrote:
> > > > >> > > > > > > >
> > > > >> > > > > > > > > Feature seems useful for me as it makes connection
> > > > >> management
> > > > >> > > > more
> > > > >> > > > > > > robust
> > > > >> > > > > > > > > and
> > > > >> > > > > > > > > predictable.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > I agree with Pavel, that we should print warning
> > when
> > > > >> > heartbeat
> > > > >> > > > > > period
> > > > >> > > > > > > is
> > > > >> > > > > > > > > larger than
> > > > >> > > > > > > > > idle timeout, but I see a problem here as idle
> > timeout
> > > > is
> > > > >> > > > > configured
> > > > >> > > > > > on
> > > > >> > > > > > > > > server and is not
> > > > >> > > > > > > > > known to clients, while heartbeats configured on
> > > clients
> > > > >> and
> > > > >> > > > their
> > > > >> > > > > > > period
> > > > >> > > > > > > > > is not known
> > > > >> > > > > > > > > to the server. Maybe clients should pass this
> > > > information
> > > > >> on
> > > > >> > to
> > > > >> > > > the
> > > > >> > > > > > > > > handshake.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Regarding Python and PHP clients - can not we use
> > some
> > > > >> kind
> > > > >> > of
> > > > >> > > > > timers
> > > > >> > > > > > > to
> > > > >> > > > > > > > > implement
> > > > >> > > > > > > > > this feature?
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Best Regards,
> > > > >> > > > > > > > > Igor
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <
> > > > >> > > > > ptupitsyn@apache.org>
> > > > >> > > > > > > > > wrote:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > Maksim, agree. Let's not be too clever and only
> > log
> > > a
> > > > >> > > warning.
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <
> > > > >> > > > > > ptupitsyn@apache.org>
> > > > >> > > > > > > > > > wrote:
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > > Ivan, idleTimeout already exists, I don't
> think
> > we
> > > > >> should
> > > > >> > > > > change
> > > > >> > > > > > > the
> > > > >> > > > > > > > > way
> > > > >> > > > > > > > > > > it works (or did I misunderstand you?)
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Of course, enabling heartbeats means that
> > > otherwise
> > > > >> idle
> > > > >> > > > > clients
> > > > >> > > > > > > will
> > > > >> > > > > > > > > no
> > > > >> > > > > > > > > > > longer be disconnected by the server.
> > > > >> > > > > > > > > > > I think we should cross-link those properties
> in
> > > the
> > > > >> > > > > > documentation
> > > > >> > > > > > > > and
> > > > >> > > > > > > > > > > explain this behavior.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > On Mon, Feb 7, 2022 at 4:39 PM Ivan
> Daschinsky <
> > > > >> > > > > > > ivandasch@gmail.com>
> > > > >> > > > > > > > > > > wrote:
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > >> >>3. Already implemented: when
> > > > >> > > > > > > > > ClientConnectorConfiguration#idleTimeout
> > > > >> > > > > > > > > > is
> > > > >> > > > > > > > > > >> not zero, server disconnects idle clients
> > > > >> > > > > > > > > > >> >>
> > > > >> > > > > > > > > > >> But I suppose it would be great to have:
> > > > >> > > > > > > > > > >> 1. If client supports keep alive, use
> > idleTimeout
> > > > >> > > > > > > > > > >> 2. If not, do not use it.
> > > > >> > > > > > > > > > >>
> > > > >> > > > > > > > > > >> But I am not sure if it is correct or not.
> > > > >> > > > > > > > > > >>
> > > > >> > > > > > > > > > >> пн, 7 февр. 2022 г. в 16:01, Maksim Timonin <
> > > > >> > > > > > > > timoninmaxim@apache.org
> > > > >> > > > > > > > > >:
> > > > >> > > > > > > > > > >>
> > > > >> > > > > > > > > > >> > I believe explicit is better than implicit
> :)
> > > > Also
> > > > >> in
> > > > >> > > case
> > > > >> > > > > of
> > > > >> > > > > > > > > dynamic
> > > > >> > > > > > > > > > >> > calculation of timeout, it can change
> > > > dynamically,
> > > > >> for
> > > > >> > > > > example
> > > > >> > > > > > > > > > >> restarting a
> > > > >> > > > > > > > > > >> > cluster with different configuration should
> > > > >> > reconfigure
> > > > >> > > > > > clients
> > > > >> > > > > > > > too.
> > > > >> > > > > > > > > > >> Looks
> > > > >> > > > > > > > > > >> > complicated.
> > > > >> > > > > > > > > > >> >
> > > > >> > > > > > > > > > >> > My vote for WARN + javadocs with mention of
> > > this
> > > > >> > issue.
> > > > >> > > > > > > > > > >> >
> > > > >> > > > > > > > > > >> > On Mon, Feb 7, 2022 at 3:51 PM Pavel
> > Tupitsyn <
> > > > >> > > > > > > > ptupitsyn@apache.org
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > >> > wrote:
> > > > >> > > > > > > > > > >> >
> > > > >> > > > > > > > > > >> > > > WDYT, should we add a WARN message for
> > > > clients
> > > > >> > that
> > > > >> > > > > > > configure
> > > > >> > > > > > > > > > >> > > > keepAliveTimeout greater than
> idleTimeout
> > > on
> > > > >> the
> > > > >> > > > server
> > > > >> > > > > > > side?
> > > > >> > > > > > > > > > >> > >
> > > > >> > > > > > > > > > >> > > I think we should either log a WARN, or
> > > > retrieve
> > > > >> > > > > idleTimeout
> > > > >> > > > > > > > from
> > > > >> > > > > > > > > > >> server
> > > > >> > > > > > > > > > >> > > and configure heartbeatTimeout
> accordingly
> > > > (e.g.
> > > > >> > > divide
> > > > >> > > > by
> > > > >> > > > > > 2).
> > > > >> > > > > > > > > > >> > > Thoughts?
> > > > >> > > > > > > > > > >> > >
> > > > >> > > > > > > > > > >> > > On Mon, Feb 7, 2022 at 3:14 PM Maksim
> > > Timonin <
> > > > >> > > > > > > > > > >> timoninmaxim@apache.org>
> > > > >> > > > > > > > > > >> > > wrote:
> > > > >> > > > > > > > > > >> > >
> > > > >> > > > > > > > > > >> > > > Hi Pavel,
> > > > >> > > > > > > > > > >> > > >
> > > > >> > > > > > > > > > >> > > > Thanks for the links. Yes, I forgot
> that
> > > the
> > > > >> flag
> > > > >> > of
> > > > >> > > > > > changed
> > > > >> > > > > > > > > > >> topology
> > > > >> > > > > > > > > > >> > is
> > > > >> > > > > > > > > > >> > > > lazy. Also I missed that the keepAlive
> > > > setting
> > > > >> is
> > > > >> > > > > > configured
> > > > >> > > > > > > > on
> > > > >> > > > > > > > > > the
> > > > >> > > > > > > > > > >> > > client
> > > > >> > > > > > > > > > >> > > > side (alternatively to idleTimeout that
> > is
> > > on
> > > > >> the
> > > > >> > > > server
> > > > >> > > > > > > > side).
> > > > >> > > > > > > > > > >> > > >
> > > > >> > > > > > > > > > >> > > > Now I understand, this feature can be
> > > helpful
> > > > >> > then.
> > > > >> > > > > Every
> > > > >> > > > > > > > client
> > > > >> > > > > > > > > > can
> > > > >> > > > > > > > > > >> > > > configure itself in case it's possible
> to
> > > be
> > > > >> idle
> > > > >> > > > > > sometimes,
> > > > >> > > > > > > > and
> > > > >> > > > > > > > > > >> choose
> > > > >> > > > > > > > > > >> > > > an appropriate timeout by itself too.
> And
> > > by
> > > > >> > default
> > > > >> > > > the
> > > > >> > > > > > > > feature
> > > > >> > > > > > > > > > >> should
> > > > >> > > > > > > > > > >> > > be
> > > > >> > > > > > > > > > >> > > > disabled.
> > > > >> > > > > > > > > > >> > > >
> > > > >> > > > > > > > > > >> > > > WDYT, should we add a WARN message for
> > > > clients
> > > > >> > that
> > > > >> > > > > > > configure
> > > > >> > > > > > > > > > >> > > > keepAliveTimeout greater than
> idleTimeout
> > > on
> > > > >> the
> > > > >> > > > server
> > > > >> > > > > > > side?
> > > > >> > > > > > > > > > >> > > >
> > > > >> > > > > > > > > > >> > > >
> > > > >> > > > > > > > > > >> > > >
> > > > >> > > > > > > > > > >> > > > On Mon, Feb 7, 2022 at 1:05 PM Pavel
> > > > Tupitsyn <
> > > > >> > > > > > > > > > ptupitsyn@apache.org
> > > > >> > > > > > > > > > >> >
> > > > >> > > > > > > > > > >> > > > wrote:
> > > > >> > > > > > > > > > >> > > >
> > > > >> > > > > > > > > > >> > > > > Ivan,
> > > > >> > > > > > > > > > >> > > > >
> > > > >> > > > > > > > > > >> > > > > I suggest the following:
> > > > >> > > > > > > > > > >> > > > >
> > > > >> > > > > > > > > > >> > > > > 1. Server sends KEEP_ALIVE feature
> > flag,
> > > > >> which
> > > > >> > > means
> > > > >> > > > > it
> > > > >> > > > > > > > > accepts
> > > > >> > > > > > > > > > >> > > > > OP_KEEP_ALIVE empty message
> > > > >> > > > > > > > > > >> > > > > 2. Client sends OP_KEEP_ALIVE when
> the
> > > > >> > connection
> > > > >> > > is
> > > > >> > > > > > idle
> > > > >> > > > > > > > for
> > > > >> > > > > > > > > a
> > > > >> > > > > > > > > > >> > > > > certain period of time
> > > > >> > > > > > > > > > >> > > > > 3. Already implemented: when
> > > > >> > > > > > > > > > >> ClientConnectorConfiguration#idleTimeout
> > > > >> > > > > > > > > > >> > > is
> > > > >> > > > > > > > > > >> > > > > not zero, server disconnects idle
> > clients
> > > > >> > > > > > > > > > >> > > > >
> > > > >> > > > > > > > > > >> > > > > This way we don't need server->client
> > > > >> > keepalives,
> > > > >> > > as
> > > > >> > > > > you
> > > > >> > > > > > > > > > correctly
> > > > >> > > > > > > > > > >> > > noted.
> > > > >> > > > > > > > > > >> > > > >
> > > > >> > > > > > > > > > >> > > > > On Mon, Feb 7, 2022 at 12:43 PM Ivan
> > > > >> Daschinsky
> > > > >> > <
> > > > >> > > > > > > > > > >> ivandasch@gmail.com
> > > > >> > > > > > > > > > >> > >
> > > > >> > > > > > > > > > >> > > > > wrote:
> > > > >> > > > > > > > > > >> > > > >
> > > > >> > > > > > > > > > >> > > > > > Pavel, I suppose that ideally:
> > > > >> > > > > > > > > > >> > > > > > 1. Client send in handshake flag,
> > that
> > > it
> > > > >> > > supports
> > > > >> > > > > > > > > KEEP_ALIVE
> > > > >> > > > > > > > > > >> > feature
> > > > >> > > > > > > > > > >> > > > and
> > > > >> > > > > > > > > > >> > > > > > server takes it into account.
> > > > >> > > > > > > > > > >> > > > > > 2. Each request of client can be
> > > > >> considered as
> > > > >> > > > > > > keep-alive
> > > > >> > > > > > > > > > ping.
> > > > >> > > > > > > > > > >> > > > > > 3. Client send failure should be
> > > > processed
> > > > >> > using
> > > > >> > > > > retry
> > > > >> > > > > > > > > policy.
> > > > >> > > > > > > > > > >> > > > > > 4. Server should not send
> keep-alive
> > > > >> packets,
> > > > >> > it
> > > > >> > > > is
> > > > >> > > > > > > > > redundant,
> > > > >> > > > > > > > > > >> but
> > > > >> > > > > > > > > > >> > > > server
> > > > >> > > > > > > > > > >> > > > > > should track requests from client
> and
> > > if
> > > > >> there
> > > > >> > > is
> > > > >> > > > no
> > > > >> > > > > > > > > requests
> > > > >> > > > > > > > > > >> from
> > > > >> > > > > > > > > > >> > > > client
> > > > >> > > > > > > > > > >> > > > > > with KEEP_ALIVE feature,
> > > > >> > > > > > > > > > >> > > > > > automatically close connection and
> > free
> > > > >> > > resources.
> > > > >> > > > > > > > > > >> > > > > >
> > > > >> > > > > > > > > > >> > > > > > Similar approach is used in
> zookeeper
> > > > >> clients.
> > > > >> > > > > > > > > > >> > > > > >
> > > > >> > > > > > > > > > >> > > > > > пн, 7 февр. 2022 г. в 12:24, Pavel
> > > > >> Tupitsyn <
> > > > >> > > > > > > > > > >> ptupitsyn@apache.org
> > > > >> > > > > > > > > > >> > >:
> > > > >> > > > > > > > > > >> > > > > >
> > > > >> > > > > > > > > > >> > > > > > > Ivan,
> > > > >> > > > > > > > > > >> > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > Ideally, the check should come
> from
> > > > both
> > > > >> > > sides.
> > > > >> > > > > > > > > > >> > > > > > > - Client periodically sends
> > keepalive
> > > > to
> > > > >> > > server
> > > > >> > > > > > > > > > >> > > > > > > - Server periodically sends
> > keepalive
> > > > to
> > > > >> > > client
> > > > >> > > > > > > > > > >> > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > Feature flags will be added
> > > > accordingly,
> > > > >> so
> > > > >> > it
> > > > >> > > > is
> > > > >> > > > > > not
> > > > >> > > > > > > > > > >> necessary
> > > > >> > > > > > > > > > >> > to
> > > > >> > > > > > > > > > >> > > > > > > implement this in all thin
> clients.
> > > > >> > > > > > > > > > >> > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > On Mon, Feb 7, 2022 at 11:43 AM
> > Ivan
> > > > >> > > Daschinsky
> > > > >> > > > <
> > > > >> > > > > > > > > > >> > > ivandasch@gmail.com
> > > > >> > > > > > > > > > >> > > > >
> > > > >> > > > > > > > > > >> > > > > > > wrote:
> > > > >> > > > > > > > > > >> > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > I suppose it is great idea, but
> > > this
> > > > >> > > > > functionality
> > > > >> > > > > > > can
> > > > >> > > > > > > > > be
> > > > >> > > > > > > > > > >> hard
> > > > >> > > > > > > > > > >> > to
> > > > >> > > > > > > > > > >> > > > > > > implement
> > > > >> > > > > > > > > > >> > > > > > > > for some platforms. I.e. sync
> > > python
> > > > >> > client
> > > > >> > > or
> > > > >> > > > > php
> > > > >> > > > > > > > > (there
> > > > >> > > > > > > > > > >> is no
> > > > >> > > > > > > > > > >> > > > real
> > > > >> > > > > > > > > > >> > > > > > > > multithreading for python (GIL)
> > and
> > > > >> php is
> > > > >> > > > > single
> > > > >> > > > > > > > > threaded
> > > > >> > > > > > > > > > >> by
> > > > >> > > > > > > > > > >> > > > > design).
> > > > >> > > > > > > > > > >> > > > > > > But
> > > > >> > > > > > > > > > >> > > > > > > > for async clients it is not
> very
> > > hard
> > > > >> to
> > > > >> > > > > > implement.
> > > > >> > > > > > > > > > >> > Nevertheless,
> > > > >> > > > > > > > > > >> > > > > this
> > > > >> > > > > > > > > > >> > > > > > > > feature should be optional,
> > because
> > > > of
> > > > >> > > > possible
> > > > >> > > > > > > > > technical
> > > > >> > > > > > > > > > >> > > > > limitations.
> > > > >> > > > > > > > > > >> > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > Pavel, is this check mostly for
> > > > client
> > > > >> > side?
> > > > >> > > > Or
> > > > >> > > > > > > > servers
> > > > >> > > > > > > > > > can
> > > > >> > > > > > > > > > >> do
> > > > >> > > > > > > > > > >> > > some
> > > > >> > > > > > > > > > >> > > > > > > actions
> > > > >> > > > > > > > > > >> > > > > > > > if there is no activity from
> thin
> > > > >> client
> > > > >> > > (i.e.
> > > > >> > > > > > > closing
> > > > >> > > > > > > > > > >> context
> > > > >> > > > > > > > > > >> > > and
> > > > >> > > > > > > > > > >> > > > > free
> > > > >> > > > > > > > > > >> > > > > > > > resources such as queries'
> > handles
> > > > and
> > > > >> so
> > > > >> > > on?)
> > > > >> > > > > > > > > > >> > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > пн, 7 февр. 2022 г. в 11:09,
> > Pavel
> > > > >> > Tupitsyn
> > > > >> > > <
> > > > >> > > > > > > > > > >> > > ptupitsyn@apache.org
> > > > >> > > > > > > > > > >> > > > >:
> > > > >> > > > > > > > > > >> > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > Hi Maksim,
> > > > >> > > > > > > > > > >> > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > > half-state is a possible
> > > > situation
> > > > >> > when
> > > > >> > > an
> > > > >> > > > > > > Ignite
> > > > >> > > > > > > > > node
> > > > >> > > > > > > > > > >> goes
> > > > >> > > > > > > > > > >> > > > down
> > > > >> > > > > > > > > > >> > > > > or
> > > > >> > > > > > > > > > >> > > > > > > > > somehow removes connection
> to a
> > > > thin
> > > > >> > > client
> > > > >> > > > > > > > > > >> > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > Half-open state is also
> > possible
> > > > >> when,
> > > > >> > for
> > > > >> > > > > > > example,
> > > > >> > > > > > > > an
> > > > >> > > > > > > > > > >> > > > intermediate
> > > > >> > > > > > > > > > >> > > > > > > > router
> > > > >> > > > > > > > > > >> > > > > > > > > is rebooted [1].
> > > > >> > > > > > > > > > >> > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > This is what we seem to have
> > > > >> encountered
> > > > >> > > > with
> > > > >> > > > > > one
> > > > >> > > > > > > of
> > > > >> > > > > > > > > our
> > > > >> > > > > > > > > > >> > > > customers
> > > > >> > > > > > > > > > >> > > > > -
> > > > >> > > > > > > > > > >> > > > > > > they
> > > > >> > > > > > > > > > >> > > > > > > > > have a stable cluster, and
> > > > >> long-living
> > > > >> > > > > (multiple
> > > > >> > > > > > > > days)
> > > > >> > > > > > > > > > >> thin
> > > > >> > > > > > > > > > >> > > > client
> > > > >> > > > > > > > > > >> > > > > > > > > connections which can be idle
> > for
> > > > >> some
> > > > >> > > time.
> > > > >> > > > > > > > > > >> > > > > > > > > And only when we send some
> data
> > > on
> > > > >> such
> > > > >> > an
> > > > >> > > > > idle
> > > > >> > > > > > > > > > >> connection do
> > > > >> > > > > > > > > > >> > > we
> > > > >> > > > > > > > > > >> > > > > > > discover
> > > > >> > > > > > > > > > >> > > > > > > > > that it is broken.
> > > > >> > > > > > > > > > >> > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > > But with enabled (true by
> > > > default)
> > > > >> > > > > > > > > partitionAwareness
> > > > >> > > > > > > > > > >> > feature
> > > > >> > > > > > > > > > >> > > > > > clients
> > > > >> > > > > > > > > > >> > > > > > > > can
> > > > >> > > > > > > > > > >> > > > > > > > > be notified about topology
> > > changes
> > > > >> > > > > > > > > > >> > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > Partition awareness is a
> "lazy"
> > > > >> > > notification
> > > > >> > > > > in
> > > > >> > > > > > a
> > > > >> > > > > > > > form
> > > > >> > > > > > > > > > of
> > > > >> > > > > > > > > > >> a
> > > > >> > > > > > > > > > >> > > > > response
> > > > >> > > > > > > > > > >> > > > > > > > > message flag [2].
> > > > >> > > > > > > > > > >> > > > > > > > > You won't get one on an idle
> > > > >> connection.
> > > > >> > > > > > > > > > >> > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > > the connections are removed
> > on
> > > > the
> > > > >> > > server
> > > > >> > > > > side
> > > > >> > > > > > > by
> > > > >> > > > > > > > > > client
> > > > >> > > > > > > > > > >> > idle
> > > > >> > > > > > > > > > >> > > > > > timeout
> > > > >> > > > > > > > > > >> > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > Idle timeout is disabled by
> > > > default.
> > > > >> > > > > > > > > > >> > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > > is it OK to keep such
> > > connections
> > > > >> > alive
> > > > >> > > > for
> > > > >> > > > > a
> > > > >> > > > > > > long
> > > > >> > > > > > > > > > time
> > > > >> > > > > > > > > > >> > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > I think it is up to the user.
> > > > >> > > > > > > > > > >> > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > > in the case of partition
> > > > awareness
> > > > >> > > > features
> > > > >> > > > > it
> > > > >> > > > > > > can
> > > > >> > > > > > > > > > lead
> > > > >> > > > > > > > > > >> to
> > > > >> > > > > > > > > > >> > > > > wasting
> > > > >> > > > > > > > > > >> > > > > > > TCP
> > > > >> > > > > > > > > > >> > > > > > > > > sockets on Ignite nodes,
> can't
> > it
> > > > >> > > > > > > > > > >> > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > Can you please elaborate?
> > > > >> > > > > > > > > > >> > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > [1]
> > > > >> > > > > > > > > > >> > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > >
> > > > >> > > > > > > > > > >> > > > > >
> > > > >> > > > > > > > > > >> > > > >
> > > > >> > > > > > > > > > >> > > >
> > > > >> > > > > > > > > > >> > >
> > > > >> > > > > > > > > > >> >
> > > > >> > > > > > > > > > >>
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
> > > > >> > > > > > > > > > >> > > > > > > > > [2]
> > > > >> > > > > > > > > > >> > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > >
> > > > >> > > > > > > > > > >> > > > > >
> > > > >> > > > > > > > > > >> > > > >
> > > > >> > > > > > > > > > >> > > >
> > > > >> > > > > > > > > > >> > >
> > > > >> > > > > > > > > > >> >
> > > > >> > > > > > > > > > >>
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
> > > > >> > > > > > > > > > >> > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > On Fri, Feb 4, 2022 at 4:01
> PM
> > > > Maksim
> > > > >> > > > Timonin
> > > > >> > > > > <
> > > > >> > > > > > > > > > >> > > > > > timoninmaxim@apache.org
> > > > >> > > > > > > > > > >> > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > wrote:
> > > > >> > > > > > > > > > >> > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > > Hi Pavel,
> > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > > Thanks for starting this
> > > thread!
> > > > >> Can I
> > > > >> > > ask
> > > > >> > > > > > some
> > > > >> > > > > > > > > > >> questions
> > > > >> > > > > > > > > > >> > > here
> > > > >> > > > > > > > > > >> > > > to
> > > > >> > > > > > > > > > >> > > > > > get
> > > > >> > > > > > > > > > >> > > > > > > > the
> > > > >> > > > > > > > > > >> > > > > > > > > > feature more clearly?
> > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > > As I understand it
> correctly,
> > > > >> > half-state
> > > > >> > > > is
> > > > >> > > > > a
> > > > >> > > > > > > > > possible
> > > > >> > > > > > > > > > >> > > > situation
> > > > >> > > > > > > > > > >> > > > > > when
> > > > >> > > > > > > > > > >> > > > > > > > an
> > > > >> > > > > > > > > > >> > > > > > > > > > Ignite node goes down or
> > > somehow
> > > > >> > removes
> > > > >> > > > > > > > connection
> > > > >> > > > > > > > > > to a
> > > > >> > > > > > > > > > >> > thin
> > > > >> > > > > > > > > > >> > > > > > client.
> > > > >> > > > > > > > > > >> > > > > > > > But
> > > > >> > > > > > > > > > >> > > > > > > > > > with enabled (true by
> > default)
> > > > >> > > > > > > partitionAwareness
> > > > >> > > > > > > > > > >> feature
> > > > >> > > > > > > > > > >> > > > clients
> > > > >> > > > > > > > > > >> > > > > > can
> > > > >> > > > > > > > > > >> > > > > > > > be
> > > > >> > > > > > > > > > >> > > > > > > > > > notified about topology
> > > changes.
> > > > >> So,
> > > > >> > > there
> > > > >> > > > > are
> > > > >> > > > > > > > > > possible
> > > > >> > > > > > > > > > >> > > cases:
> > > > >> > > > > > > > > > >> > > > > > > > > > 1. ThinClient connects to a
> > > > single
> > > > >> > node.
> > > > >> > > > > > > > > > >> > > > > > > > > > 2. Ignite node removes
> > > connection
> > > > >> from
> > > > >> > > > > itself.
> > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > > I like the idea for the
> case
> > > > with a
> > > > >> > > single
> > > > >> > > > > > node,
> > > > >> > > > > > > > as
> > > > >> > > > > > > > > it
> > > > >> > > > > > > > > > >> > helps
> > > > >> > > > > > > > > > >> > > > fail
> > > > >> > > > > > > > > > >> > > > > > > fast.
> > > > >> > > > > > > > > > >> > > > > > > > > > But is it OK to connect a
> > > client
> > > > >> to a
> > > > >> > > > single
> > > > >> > > > > > > node
> > > > >> > > > > > > > > > only?
> > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > > For the second one: you
> > mention
> > > > >> that a
> > > > >> > > > case
> > > > >> > > > > > for
> > > > >> > > > > > > > the
> > > > >> > > > > > > > > > >> second
> > > > >> > > > > > > > > > >> > > > option
> > > > >> > > > > > > > > > >> > > > > > is
> > > > >> > > > > > > > > > >> > > > > > > > > > "Long-living and mostly
> idle
> > > > >> > connections
> > > > >> > > > are
> > > > >> > > > > > > > > > especially
> > > > >> > > > > > > > > > >> > > > > susceptible
> > > > >> > > > > > > > > > >> > > > > > > to
> > > > >> > > > > > > > > > >> > > > > > > > > this
> > > > >> > > > > > > > > > >> > > > > > > > > > behavior". If I understand
> > > > >> correctly
> > > > >> > the
> > > > >> > > > > > > > connections
> > > > >> > > > > > > > > > are
> > > > >> > > > > > > > > > >> > > > removed
> > > > >> > > > > > > > > > >> > > > > on
> > > > >> > > > > > > > > > >> > > > > > > the
> > > > >> > > > > > > > > > >> > > > > > > > > > server side by client idle
> > > > timeout.
> > > > >> > Can
> > > > >> > > we
> > > > >> > > > > > just
> > > > >> > > > > > > > > > >> configure
> > > > >> > > > > > > > > > >> > the
> > > > >> > > > > > > > > > >> > > > > idle
> > > > >> > > > > > > > > > >> > > > > > > > > timeout
> > > > >> > > > > > > > > > >> > > > > > > > > > for cases where we really
> > need
> > > > >> keeping
> > > > >> > > > alive
> > > > >> > > > > > > idle
> > > > >> > > > > > > > > > >> > > connections?
> > > > >> > > > > > > > > > >> > > > > Are
> > > > >> > > > > > > > > > >> > > > > > > > there
> > > > >> > > > > > > > > > >> > > > > > > > > > any other cases with
> > > unexpectedly
> > > > >> > > dropped
> > > > >> > > > > > > > > connections?
> > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > > I'm wondering is it OK to
> > keep
> > > > such
> > > > >> > > > > > connections
> > > > >> > > > > > > > > alive
> > > > >> > > > > > > > > > >> for a
> > > > >> > > > > > > > > > >> > > > long
> > > > >> > > > > > > > > > >> > > > > > > time?
> > > > >> > > > > > > > > > >> > > > > > > > > > Also in the case of
> partition
> > > > >> > awareness
> > > > >> > > > > > features
> > > > >> > > > > > > > it
> > > > >> > > > > > > > > > can
> > > > >> > > > > > > > > > >> > lead
> > > > >> > > > > > > > > > >> > > to
> > > > >> > > > > > > > > > >> > > > > > > wasting
> > > > >> > > > > > > > > > >> > > > > > > > > TCP
> > > > >> > > > > > > > > > >> > > > > > > > > > sockets on Ignite nodes,
> > can't
> > > > it?
> > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > > Thanks!
> > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > > On Thu, Feb 3, 2022 at 2:24
> > PM
> > > > >> Pavel
> > > > >> > > > > Tupitsyn
> > > > >> > > > > > <
> > > > >> > > > > > > > > > >> > > > > > ptupitsyn@apache.org>
> > > > >> > > > > > > > > > >> > > > > > > > > > wrote:
> > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > > >> Igniters,
> > > > >> > > > > > > > > > >> > > > > > > > > >>
> > > > >> > > > > > > > > > >> > > > > > > > > >> Please review the proposal
> > to
> > > > add
> > > > >> > > > heartbeat
> > > > >> > > > > > > > > messages
> > > > >> > > > > > > > > > to
> > > > >> > > > > > > > > > >> > the
> > > > >> > > > > > > > > > >> > > > thin
> > > > >> > > > > > > > > > >> > > > > > > > client
> > > > >> > > > > > > > > > >> > > > > > > > > >> protocol (both 2.x and
> 3.x)
> > > and
> > > > >> let
> > > > >> > me
> > > > >> > > > know
> > > > >> > > > > > > your
> > > > >> > > > > > > > > > >> thoughts:
> > > > >> > > > > > > > > > >> > > > > > > > > >>
> > > > >> > > > > > > > > > >> > > > > > > > > >>
> > > > >> > > > > > > > > > >> > > > > > > > > >>
> > > > >> > > > > > > > > > >> > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > >
> > > > >> > > > > > > > > > >> > > > > >
> > > > >> > > > > > > > > > >> > > > >
> > > > >> > > > > > > > > > >> > > >
> > > > >> > > > > > > > > > >> > >
> > > > >> > > > > > > > > > >> >
> > > > >> > > > > > > > > > >>
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
> > > > >> > > > > > > > > > >> > > > > > > > > >>
> > > > >> > > > > > > > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > > > --
> > > > >> > > > > > > > > > >> > > > > > > > Sincerely yours, Ivan
> Daschinskiy
> > > > >> > > > > > > > > > >> > > > > > > >
> > > > >> > > > > > > > > > >> > > > > > >
> > > > >> > > > > > > > > > >> > > > > >
> > > > >> > > > > > > > > > >> > > > > >
> > > > >> > > > > > > > > > >> > > > > > --
> > > > >> > > > > > > > > > >> > > > > > Sincerely yours, Ivan Daschinskiy
> > > > >> > > > > > > > > > >> > > > > >
> > > > >> > > > > > > > > > >> > > > >
> > > > >> > > > > > > > > > >> > > >
> > > > >> > > > > > > > > > >> > >
> > > > >> > > > > > > > > > >> >
> > > > >> > > > > > > > > > >>
> > > > >> > > > > > > > > > >>
> > > > >> > > > > > > > > > >> --
> > > > >> > > > > > > > > > >> Sincerely yours, Ivan Daschinskiy
> > > > >> > > > > > > > > > >>
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > --
> > > > >> > > > > > > Sincerely yours, Ivan Daschinskiy
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > --
> > > > >> > > > > Sincerely yours, Ivan Daschinskiy
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> > >
> > > > >> > > --
> > > > >> > > Sincerely yours, Ivan Daschinskiy
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > Sincerely yours, Ivan Daschinskiy
> > > > >
> > > >
> > >
> >
>

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Pavel Tupitsyn <pt...@apache.org>.
> What about "keepAlive", "keepAliveInterval" then? It looks more common
and matches the IEP title :)
According to Google, HeartbeatInterval has ~169K results, and
KeepAliveInterval has ~110K :)

In my experience, both are well understood. I am equally willing to use any
of them.
Any other opinions?

On Tue, Feb 15, 2022 at 6:11 PM Maksim Timonin <ti...@apache.org>
wrote:

> What about "keepAlive", "keepAliveInterval" then? It looks more common and
> matches the IEP title :)
>
> On Tue, Feb 15, 2022 at 5:54 PM Pavel Tupitsyn <pt...@apache.org>
> wrote:
>
> > To summarize, we add two properties to the ClientConfiguration:
> > bool heartbeatsEnabled = true;
> > long defaultHeartbeatInterval = 60_000; // Default 1 minute, used
> >
> > Logic:
> > if (heartbeatsEnabled) {
> >   heartbeatInterval = serverIdleTimeout > 0 ? serverIdleTimeout / 3 :
> > defaultHeartbeatInterval;
> > }
> >
> >
> > Thoughts, objections?
> >
> > On Tue, Feb 15, 2022 at 4:32 PM Ivan Daschinsky <iv...@gmail.com>
> > wrote:
> >
> > > Pavel, sorry, i've made mistake. But current behaviour is ok for me.
> This
> > > timeout cannot be change on server side runtime. But we can simplify
> > > protocol just use one opcode and message
> > >
> > > вт, 15 февр. 2022 г., 14:54 Ivan Daschinsky <iv...@gmail.com>:
> > >
> > > > > Idle timeout can't change, why send it back with every heartbeat
> > > > response?
> > > > May be I am wrong, but from code I see this behaviour. But if I am
> > wrong,
> > > > this is ok behaviour for me.
> > > >
> > > >
> > > >
> > > > вт, 15 февр. 2022 г. в 14:00, Pavel Tupitsyn <pt...@apache.org>:
> > > >
> > > >> Ivan, I mostly agree with your proposal, except this point:
> > > >>
> > > >> > Response to heartbeat request -- is idle timeout
> > > >> Idle timeout can't change, why send it back with every heartbeat
> > > response?
> > > >>
> > > >> > possible cases with cluster restart, upgrade
> > > >> In those cases, a new connection will be established, and we'll
> > retrieve
> > > >> the new timeout after the handshake.
> > > >>
> > > >>
> > > >> On Tue, Feb 15, 2022 at 12:04 PM Maksim Timonin <
> > > timoninmaxim@apache.org>
> > > >> wrote:
> > > >>
> > > >> > Hi Ivan,
> > > >> >
> > > >> > Cases you described sound reasonable to me. Then the client should
> > > just
> > > >> set
> > > >> > up the `keepAlive` flag, and it just works.
> > > >> >
> > > >> > So, there are 3 branches:
> > > >> > 1. Users don't configure keepAlive at all.
> > > >> > 2. Users configure keepAliveHeartbeatInterval (long, ms).
> > > >> > 3. Users configure keepAlive (boolean).
> > > >> >
> > > >> > AFAIU, Pavel's proposal is about covering the second case only.
> But
> > > >> > actually the 2nd and 3rd aren't conflicted with each other.I think
> > for
> > > >> both
> > > >> > branches, a cluster should respond with idleTimeout value on every
> > > keep
> > > >> > alive client request. Because there are possible cases with
> cluster
> > > >> > restart, upgrade, etc. Clients should check every response and in
> > case
> > > >> of
> > > >> > changed idleTimeout. For 2nd case write a WARN message, and for
> 3rd
> > -
> > > >> > reconfigure themself in case of changed idleTimeout.
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Tue, Feb 15, 2022 at 9:51 AM Ivan Daschinsky <
> > ivandasch@gmail.com>
> > > >> > wrote:
> > > >> >
> > > >> > > Regarding discussion here [1]
> > > >> > >
> > > >> > > I suppose that this feature, despite the fact that initial
> > intention
> > > >> of
> > > >> > > Pavel was different, can drastically
> > > >> > > improve the usage pattern of thin clients and give a lot of
> > > >> opportunities
> > > >> > > if the following is done:
> > > >> > >
> > > >> > > 1. GridNioServer has a great feature -- idle timeout. If  a
> server
> > > did
> > > >> > not
> > > >> > > receive any from a client -- it will be kicked off.
> > > >> > >     But there are some scenarios that make the use of this
> feature
> > > >> > > impossible:
> > > >> > > a. Multiple workers waiting for batch tasks and relatively low
> > > >> requests
> > > >> > > rate -- this services will be often kicked off and must
> reconnect.
> > > >> > > In order to prevent this behaviour, the user must implement a
> kind
> > > of
> > > >> > > heartbeating by himself.
> > > >> > > b. Quite often user may want to implement leader-follower
> pattern
> > > for
> > > >> > > services for HA, so followers also will be considered as idle.
> > > Kicking
> > > >> > off
> > > >> > > these followers
> > > >> > > is not acceptable, so user  should also implement heartbeating
> by
> > > >> > himself.
> > > >> > >
> > > >> > > My proposition is:
> > > >> > > 1. Add two flags -- enable/disable heartbeats, and very optional
> > > >> > heartbeat
> > > >> > > timeout. Set enable to true by default, timeout to default
> > heartbeat
> > > >> > > timeout.
> > > >> > > 2. If server and client both support this feature, and
> heartbeats
> > > are
> > > >> not
> > > >> > > explicitly disabled on client side:
> > > >> > > 3. Response to heartbeat request -- is idle timeout. If idle
> > timeout
> > > >> is
> > > >> > set
> > > >> > > on the server side , set heartbeat timeout to one-third of it,
> > > instead
> > > >> > set
> > > >> > > to default or specified value.
> > > >> > >
> > > >> > > Pros:
> > > >> > > 1. Easy to set up -- just flag on client side and just set
> timeout
> > > on
> > > >> > > server side.
> > > >> > > 2. Hard to configure improperly, i.e set heartbeat timeout not
> > short
> > > >> > enough
> > > >> > > in order to prevent kicking out by server.
> > > >> > > 3. If the user just wants heartbeats without setting idle
> timeout
> > --
> > > >> > > heartbeats are by default on and with reasonable timeout.
> > > >> > >
> > > >> > > Cons:
> > > >> > > 1. If someone will rely on old behavior and just wants to drop
> his
> > > >> > clients
> > > >> > > on timeout -- this will not work without reconfiguring, he
> should
> > > >> disable
> > > >> > > heartbeats.
> > > >> > > But I cannot even imagine that someone will find this behaviour
> > > >> > desirable.
> > > >> > > I strongly believe that this behaviour prevents users from using
> > > >> > > idleTimeout on server side.
> > > >> > >
> > > >> > > [1] --
> > > >> https://github.com/apache/ignite/pull/9817#discussion_r805628955
> > > >> > >
> > > >> > > пт, 11 февр. 2022 г. в 10:58, Pavel Tupitsyn <
> > ptupitsyn@apache.org
> > > >:
> > > >> > >
> > > >> > > > I've prepared a PR, please have a look:
> > > >> > > > https://github.com/apache/ignite/pull/9817
> > > >> > > >
> > > >> > > > On Mon, Feb 7, 2022 at 6:37 PM Ivan Daschinsky <
> > > ivandasch@gmail.com
> > > >> >
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > > > I see potential in this feature, especially if we use
> > something
> > > >> like
> > > >> > > > > continuous query. Stale clients can consume a lot of
> resources
> > > >> and it
> > > >> > > is
> > > >> > > > > worth kick these clients out.
> > > >> > > > >
> > > >> > > > > пн, 7 февр. 2022 г. в 18:25, Pavel Tupitsyn <
> > > ptupitsyn@apache.org
> > > >> >:
> > > >> > > > >
> > > >> > > > > > > If we use new approach, we can reduce this timeout. But
> > this
> > > >> can
> > > >> > > > affect
> > > >> > > > > > old clients.
> > > >> > > > > >
> > > >> > > > > > idleTimeout is disabled by default, we are not going to
> > change
> > > >> > this.
> > > >> > > > > >
> > > >> > > > > > > Also, let's think about that sending heartbeats and
> > interval
> > > >> of
> > > >> > > > sending
> > > >> > > > > > > heartbeats could be calculated on the server side (i.e.
> > one
> > > >> third
> > > >> > > of
> > > >> > > > > idle
> > > >> > > > > > > timeout) and sent to the client during handshake.
> > > >> > > > > > > Also we can introduce something like a negotiation
> > mechanism
> > > >> as
> > > >> > in
> > > >> > > > > > > zookeeper.
> > > >> > > > > >
> > > >> > > > > > I tend to agree with Maksim here, let's keep it simple and
> > > >> > explicit.
> > > >> > > > > > Log a warning, but don't do anything clever.
> > > >> > > > > >
> > > >> > > > > > On Mon, Feb 7, 2022 at 6:15 PM Ivan Daschinsky <
> > > >> > ivandasch@gmail.com>
> > > >> > > > > > wrote:
> > > >> > > > > >
> > > >> > > > > > > >> idleTimeout already exists, I don't think we should
> > > change
> > > >> the
> > > >> > > way
> > > >> > > > > it
> > > >> > > > > > > works (or did I misunderstand you?)
> > > >> > > > > > > If we use new approach, we can reduce this timeout. But
> > this
> > > >> can
> > > >> > > > affect
> > > >> > > > > > old
> > > >> > > > > > > clients.
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > Also, let's think about that sending heartbeats and
> > interval
> > > >> of
> > > >> > > > sending
> > > >> > > > > > > heartbeats could be calculated on the server side (i.e.
> > one
> > > >> third
> > > >> > > of
> > > >> > > > > idle
> > > >> > > > > > > timeout) and sent to the client
> > > >> > > > > > > during handshake.
> > > >> > > > > > > Also we can introduce something like a negotiation
> > mechanism
> > > >> as
> > > >> > in
> > > >> > > > > > > zookeeper.
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > пн, 7 февр. 2022 г. в 18:05, Pavel Tupitsyn <
> > > >> > ptupitsyn@apache.org
> > > >> > > >:
> > > >> > > > > > >
> > > >> > > > > > > > Igor,
> > > >> > > > > > > >
> > > >> > > > > > > > > Maybe clients should pass this information on to the
> > > >> > handshake.
> > > >> > > > > > > >
> > > >> > > > > > > > Do you think we should log a mismatched timeout
> warning
> > on
> > > >> the
> > > >> > > > > server,
> > > >> > > > > > > not
> > > >> > > > > > > > on the client?
> > > >> > > > > > > > Or should we do both?
> > > >> > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > > > I've updated the proposal with OP_GET_IDLE_TIMEOUT and
> > > some
> > > >> > other
> > > >> > > > > > details
> > > >> > > > > > > > discussed above.
> > > >> > > > > > > >
> > > >> > > > > > > > On Mon, Feb 7, 2022 at 5:42 PM Igor Sapego <
> > > >> isapego@apache.org
> > > >> > >
> > > >> > > > > wrote:
> > > >> > > > > > > >
> > > >> > > > > > > > > Feature seems useful for me as it makes connection
> > > >> management
> > > >> > > > more
> > > >> > > > > > > robust
> > > >> > > > > > > > > and
> > > >> > > > > > > > > predictable.
> > > >> > > > > > > > >
> > > >> > > > > > > > > I agree with Pavel, that we should print warning
> when
> > > >> > heartbeat
> > > >> > > > > > period
> > > >> > > > > > > is
> > > >> > > > > > > > > larger than
> > > >> > > > > > > > > idle timeout, but I see a problem here as idle
> timeout
> > > is
> > > >> > > > > configured
> > > >> > > > > > on
> > > >> > > > > > > > > server and is not
> > > >> > > > > > > > > known to clients, while heartbeats configured on
> > clients
> > > >> and
> > > >> > > > their
> > > >> > > > > > > period
> > > >> > > > > > > > > is not known
> > > >> > > > > > > > > to the server. Maybe clients should pass this
> > > information
> > > >> on
> > > >> > to
> > > >> > > > the
> > > >> > > > > > > > > handshake.
> > > >> > > > > > > > >
> > > >> > > > > > > > > Regarding Python and PHP clients - can not we use
> some
> > > >> kind
> > > >> > of
> > > >> > > > > timers
> > > >> > > > > > > to
> > > >> > > > > > > > > implement
> > > >> > > > > > > > > this feature?
> > > >> > > > > > > > >
> > > >> > > > > > > > > Best Regards,
> > > >> > > > > > > > > Igor
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <
> > > >> > > > > ptupitsyn@apache.org>
> > > >> > > > > > > > > wrote:
> > > >> > > > > > > > >
> > > >> > > > > > > > > > Maksim, agree. Let's not be too clever and only
> log
> > a
> > > >> > > warning.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <
> > > >> > > > > > ptupitsyn@apache.org>
> > > >> > > > > > > > > > wrote:
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > > Ivan, idleTimeout already exists, I don't think
> we
> > > >> should
> > > >> > > > > change
> > > >> > > > > > > the
> > > >> > > > > > > > > way
> > > >> > > > > > > > > > > it works (or did I misunderstand you?)
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > Of course, enabling heartbeats means that
> > otherwise
> > > >> idle
> > > >> > > > > clients
> > > >> > > > > > > will
> > > >> > > > > > > > > no
> > > >> > > > > > > > > > > longer be disconnected by the server.
> > > >> > > > > > > > > > > I think we should cross-link those properties in
> > the
> > > >> > > > > > documentation
> > > >> > > > > > > > and
> > > >> > > > > > > > > > > explain this behavior.
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > > On Mon, Feb 7, 2022 at 4:39 PM Ivan Daschinsky <
> > > >> > > > > > > ivandasch@gmail.com>
> > > >> > > > > > > > > > > wrote:
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > > >> >>3. Already implemented: when
> > > >> > > > > > > > > ClientConnectorConfiguration#idleTimeout
> > > >> > > > > > > > > > is
> > > >> > > > > > > > > > >> not zero, server disconnects idle clients
> > > >> > > > > > > > > > >> >>
> > > >> > > > > > > > > > >> But I suppose it would be great to have:
> > > >> > > > > > > > > > >> 1. If client supports keep alive, use
> idleTimeout
> > > >> > > > > > > > > > >> 2. If not, do not use it.
> > > >> > > > > > > > > > >>
> > > >> > > > > > > > > > >> But I am not sure if it is correct or not.
> > > >> > > > > > > > > > >>
> > > >> > > > > > > > > > >> пн, 7 февр. 2022 г. в 16:01, Maksim Timonin <
> > > >> > > > > > > > timoninmaxim@apache.org
> > > >> > > > > > > > > >:
> > > >> > > > > > > > > > >>
> > > >> > > > > > > > > > >> > I believe explicit is better than implicit :)
> > > Also
> > > >> in
> > > >> > > case
> > > >> > > > > of
> > > >> > > > > > > > > dynamic
> > > >> > > > > > > > > > >> > calculation of timeout, it can change
> > > dynamically,
> > > >> for
> > > >> > > > > example
> > > >> > > > > > > > > > >> restarting a
> > > >> > > > > > > > > > >> > cluster with different configuration should
> > > >> > reconfigure
> > > >> > > > > > clients
> > > >> > > > > > > > too.
> > > >> > > > > > > > > > >> Looks
> > > >> > > > > > > > > > >> > complicated.
> > > >> > > > > > > > > > >> >
> > > >> > > > > > > > > > >> > My vote for WARN + javadocs with mention of
> > this
> > > >> > issue.
> > > >> > > > > > > > > > >> >
> > > >> > > > > > > > > > >> > On Mon, Feb 7, 2022 at 3:51 PM Pavel
> Tupitsyn <
> > > >> > > > > > > > ptupitsyn@apache.org
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > >> > wrote:
> > > >> > > > > > > > > > >> >
> > > >> > > > > > > > > > >> > > > WDYT, should we add a WARN message for
> > > clients
> > > >> > that
> > > >> > > > > > > configure
> > > >> > > > > > > > > > >> > > > keepAliveTimeout greater than idleTimeout
> > on
> > > >> the
> > > >> > > > server
> > > >> > > > > > > side?
> > > >> > > > > > > > > > >> > >
> > > >> > > > > > > > > > >> > > I think we should either log a WARN, or
> > > retrieve
> > > >> > > > > idleTimeout
> > > >> > > > > > > > from
> > > >> > > > > > > > > > >> server
> > > >> > > > > > > > > > >> > > and configure heartbeatTimeout accordingly
> > > (e.g.
> > > >> > > divide
> > > >> > > > by
> > > >> > > > > > 2).
> > > >> > > > > > > > > > >> > > Thoughts?
> > > >> > > > > > > > > > >> > >
> > > >> > > > > > > > > > >> > > On Mon, Feb 7, 2022 at 3:14 PM Maksim
> > Timonin <
> > > >> > > > > > > > > > >> timoninmaxim@apache.org>
> > > >> > > > > > > > > > >> > > wrote:
> > > >> > > > > > > > > > >> > >
> > > >> > > > > > > > > > >> > > > Hi Pavel,
> > > >> > > > > > > > > > >> > > >
> > > >> > > > > > > > > > >> > > > Thanks for the links. Yes, I forgot that
> > the
> > > >> flag
> > > >> > of
> > > >> > > > > > changed
> > > >> > > > > > > > > > >> topology
> > > >> > > > > > > > > > >> > is
> > > >> > > > > > > > > > >> > > > lazy. Also I missed that the keepAlive
> > > setting
> > > >> is
> > > >> > > > > > configured
> > > >> > > > > > > > on
> > > >> > > > > > > > > > the
> > > >> > > > > > > > > > >> > > client
> > > >> > > > > > > > > > >> > > > side (alternatively to idleTimeout that
> is
> > on
> > > >> the
> > > >> > > > server
> > > >> > > > > > > > side).
> > > >> > > > > > > > > > >> > > >
> > > >> > > > > > > > > > >> > > > Now I understand, this feature can be
> > helpful
> > > >> > then.
> > > >> > > > > Every
> > > >> > > > > > > > client
> > > >> > > > > > > > > > can
> > > >> > > > > > > > > > >> > > > configure itself in case it's possible to
> > be
> > > >> idle
> > > >> > > > > > sometimes,
> > > >> > > > > > > > and
> > > >> > > > > > > > > > >> choose
> > > >> > > > > > > > > > >> > > > an appropriate timeout by itself too. And
> > by
> > > >> > default
> > > >> > > > the
> > > >> > > > > > > > feature
> > > >> > > > > > > > > > >> should
> > > >> > > > > > > > > > >> > > be
> > > >> > > > > > > > > > >> > > > disabled.
> > > >> > > > > > > > > > >> > > >
> > > >> > > > > > > > > > >> > > > WDYT, should we add a WARN message for
> > > clients
> > > >> > that
> > > >> > > > > > > configure
> > > >> > > > > > > > > > >> > > > keepAliveTimeout greater than idleTimeout
> > on
> > > >> the
> > > >> > > > server
> > > >> > > > > > > side?
> > > >> > > > > > > > > > >> > > >
> > > >> > > > > > > > > > >> > > >
> > > >> > > > > > > > > > >> > > >
> > > >> > > > > > > > > > >> > > > On Mon, Feb 7, 2022 at 1:05 PM Pavel
> > > Tupitsyn <
> > > >> > > > > > > > > > ptupitsyn@apache.org
> > > >> > > > > > > > > > >> >
> > > >> > > > > > > > > > >> > > > wrote:
> > > >> > > > > > > > > > >> > > >
> > > >> > > > > > > > > > >> > > > > Ivan,
> > > >> > > > > > > > > > >> > > > >
> > > >> > > > > > > > > > >> > > > > I suggest the following:
> > > >> > > > > > > > > > >> > > > >
> > > >> > > > > > > > > > >> > > > > 1. Server sends KEEP_ALIVE feature
> flag,
> > > >> which
> > > >> > > means
> > > >> > > > > it
> > > >> > > > > > > > > accepts
> > > >> > > > > > > > > > >> > > > > OP_KEEP_ALIVE empty message
> > > >> > > > > > > > > > >> > > > > 2. Client sends OP_KEEP_ALIVE when the
> > > >> > connection
> > > >> > > is
> > > >> > > > > > idle
> > > >> > > > > > > > for
> > > >> > > > > > > > > a
> > > >> > > > > > > > > > >> > > > > certain period of time
> > > >> > > > > > > > > > >> > > > > 3. Already implemented: when
> > > >> > > > > > > > > > >> ClientConnectorConfiguration#idleTimeout
> > > >> > > > > > > > > > >> > > is
> > > >> > > > > > > > > > >> > > > > not zero, server disconnects idle
> clients
> > > >> > > > > > > > > > >> > > > >
> > > >> > > > > > > > > > >> > > > > This way we don't need server->client
> > > >> > keepalives,
> > > >> > > as
> > > >> > > > > you
> > > >> > > > > > > > > > correctly
> > > >> > > > > > > > > > >> > > noted.
> > > >> > > > > > > > > > >> > > > >
> > > >> > > > > > > > > > >> > > > > On Mon, Feb 7, 2022 at 12:43 PM Ivan
> > > >> Daschinsky
> > > >> > <
> > > >> > > > > > > > > > >> ivandasch@gmail.com
> > > >> > > > > > > > > > >> > >
> > > >> > > > > > > > > > >> > > > > wrote:
> > > >> > > > > > > > > > >> > > > >
> > > >> > > > > > > > > > >> > > > > > Pavel, I suppose that ideally:
> > > >> > > > > > > > > > >> > > > > > 1. Client send in handshake flag,
> that
> > it
> > > >> > > supports
> > > >> > > > > > > > > KEEP_ALIVE
> > > >> > > > > > > > > > >> > feature
> > > >> > > > > > > > > > >> > > > and
> > > >> > > > > > > > > > >> > > > > > server takes it into account.
> > > >> > > > > > > > > > >> > > > > > 2. Each request of client can be
> > > >> considered as
> > > >> > > > > > > keep-alive
> > > >> > > > > > > > > > ping.
> > > >> > > > > > > > > > >> > > > > > 3. Client send failure should be
> > > processed
> > > >> > using
> > > >> > > > > retry
> > > >> > > > > > > > > policy.
> > > >> > > > > > > > > > >> > > > > > 4. Server should not send keep-alive
> > > >> packets,
> > > >> > it
> > > >> > > > is
> > > >> > > > > > > > > redundant,
> > > >> > > > > > > > > > >> but
> > > >> > > > > > > > > > >> > > > server
> > > >> > > > > > > > > > >> > > > > > should track requests from client and
> > if
> > > >> there
> > > >> > > is
> > > >> > > > no
> > > >> > > > > > > > > requests
> > > >> > > > > > > > > > >> from
> > > >> > > > > > > > > > >> > > > client
> > > >> > > > > > > > > > >> > > > > > with KEEP_ALIVE feature,
> > > >> > > > > > > > > > >> > > > > > automatically close connection and
> free
> > > >> > > resources.
> > > >> > > > > > > > > > >> > > > > >
> > > >> > > > > > > > > > >> > > > > > Similar approach is used in zookeeper
> > > >> clients.
> > > >> > > > > > > > > > >> > > > > >
> > > >> > > > > > > > > > >> > > > > > пн, 7 февр. 2022 г. в 12:24, Pavel
> > > >> Tupitsyn <
> > > >> > > > > > > > > > >> ptupitsyn@apache.org
> > > >> > > > > > > > > > >> > >:
> > > >> > > > > > > > > > >> > > > > >
> > > >> > > > > > > > > > >> > > > > > > Ivan,
> > > >> > > > > > > > > > >> > > > > > >
> > > >> > > > > > > > > > >> > > > > > > Ideally, the check should come from
> > > both
> > > >> > > sides.
> > > >> > > > > > > > > > >> > > > > > > - Client periodically sends
> keepalive
> > > to
> > > >> > > server
> > > >> > > > > > > > > > >> > > > > > > - Server periodically sends
> keepalive
> > > to
> > > >> > > client
> > > >> > > > > > > > > > >> > > > > > >
> > > >> > > > > > > > > > >> > > > > > > Feature flags will be added
> > > accordingly,
> > > >> so
> > > >> > it
> > > >> > > > is
> > > >> > > > > > not
> > > >> > > > > > > > > > >> necessary
> > > >> > > > > > > > > > >> > to
> > > >> > > > > > > > > > >> > > > > > > implement this in all thin clients.
> > > >> > > > > > > > > > >> > > > > > >
> > > >> > > > > > > > > > >> > > > > > > On Mon, Feb 7, 2022 at 11:43 AM
> Ivan
> > > >> > > Daschinsky
> > > >> > > > <
> > > >> > > > > > > > > > >> > > ivandasch@gmail.com
> > > >> > > > > > > > > > >> > > > >
> > > >> > > > > > > > > > >> > > > > > > wrote:
> > > >> > > > > > > > > > >> > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > I suppose it is great idea, but
> > this
> > > >> > > > > functionality
> > > >> > > > > > > can
> > > >> > > > > > > > > be
> > > >> > > > > > > > > > >> hard
> > > >> > > > > > > > > > >> > to
> > > >> > > > > > > > > > >> > > > > > > implement
> > > >> > > > > > > > > > >> > > > > > > > for some platforms. I.e. sync
> > python
> > > >> > client
> > > >> > > or
> > > >> > > > > php
> > > >> > > > > > > > > (there
> > > >> > > > > > > > > > >> is no
> > > >> > > > > > > > > > >> > > > real
> > > >> > > > > > > > > > >> > > > > > > > multithreading for python (GIL)
> and
> > > >> php is
> > > >> > > > > single
> > > >> > > > > > > > > threaded
> > > >> > > > > > > > > > >> by
> > > >> > > > > > > > > > >> > > > > design).
> > > >> > > > > > > > > > >> > > > > > > But
> > > >> > > > > > > > > > >> > > > > > > > for async clients it is not very
> > hard
> > > >> to
> > > >> > > > > > implement.
> > > >> > > > > > > > > > >> > Nevertheless,
> > > >> > > > > > > > > > >> > > > > this
> > > >> > > > > > > > > > >> > > > > > > > feature should be optional,
> because
> > > of
> > > >> > > > possible
> > > >> > > > > > > > > technical
> > > >> > > > > > > > > > >> > > > > limitations.
> > > >> > > > > > > > > > >> > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > Pavel, is this check mostly for
> > > client
> > > >> > side?
> > > >> > > > Or
> > > >> > > > > > > > servers
> > > >> > > > > > > > > > can
> > > >> > > > > > > > > > >> do
> > > >> > > > > > > > > > >> > > some
> > > >> > > > > > > > > > >> > > > > > > actions
> > > >> > > > > > > > > > >> > > > > > > > if there is no activity from thin
> > > >> client
> > > >> > > (i.e.
> > > >> > > > > > > closing
> > > >> > > > > > > > > > >> context
> > > >> > > > > > > > > > >> > > and
> > > >> > > > > > > > > > >> > > > > free
> > > >> > > > > > > > > > >> > > > > > > > resources such as queries'
> handles
> > > and
> > > >> so
> > > >> > > on?)
> > > >> > > > > > > > > > >> > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > пн, 7 февр. 2022 г. в 11:09,
> Pavel
> > > >> > Tupitsyn
> > > >> > > <
> > > >> > > > > > > > > > >> > > ptupitsyn@apache.org
> > > >> > > > > > > > > > >> > > > >:
> > > >> > > > > > > > > > >> > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > Hi Maksim,
> > > >> > > > > > > > > > >> > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > > half-state is a possible
> > > situation
> > > >> > when
> > > >> > > an
> > > >> > > > > > > Ignite
> > > >> > > > > > > > > node
> > > >> > > > > > > > > > >> goes
> > > >> > > > > > > > > > >> > > > down
> > > >> > > > > > > > > > >> > > > > or
> > > >> > > > > > > > > > >> > > > > > > > > somehow removes connection to a
> > > thin
> > > >> > > client
> > > >> > > > > > > > > > >> > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > Half-open state is also
> possible
> > > >> when,
> > > >> > for
> > > >> > > > > > > example,
> > > >> > > > > > > > an
> > > >> > > > > > > > > > >> > > > intermediate
> > > >> > > > > > > > > > >> > > > > > > > router
> > > >> > > > > > > > > > >> > > > > > > > > is rebooted [1].
> > > >> > > > > > > > > > >> > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > This is what we seem to have
> > > >> encountered
> > > >> > > > with
> > > >> > > > > > one
> > > >> > > > > > > of
> > > >> > > > > > > > > our
> > > >> > > > > > > > > > >> > > > customers
> > > >> > > > > > > > > > >> > > > > -
> > > >> > > > > > > > > > >> > > > > > > they
> > > >> > > > > > > > > > >> > > > > > > > > have a stable cluster, and
> > > >> long-living
> > > >> > > > > (multiple
> > > >> > > > > > > > days)
> > > >> > > > > > > > > > >> thin
> > > >> > > > > > > > > > >> > > > client
> > > >> > > > > > > > > > >> > > > > > > > > connections which can be idle
> for
> > > >> some
> > > >> > > time.
> > > >> > > > > > > > > > >> > > > > > > > > And only when we send some data
> > on
> > > >> such
> > > >> > an
> > > >> > > > > idle
> > > >> > > > > > > > > > >> connection do
> > > >> > > > > > > > > > >> > > we
> > > >> > > > > > > > > > >> > > > > > > discover
> > > >> > > > > > > > > > >> > > > > > > > > that it is broken.
> > > >> > > > > > > > > > >> > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > > But with enabled (true by
> > > default)
> > > >> > > > > > > > > partitionAwareness
> > > >> > > > > > > > > > >> > feature
> > > >> > > > > > > > > > >> > > > > > clients
> > > >> > > > > > > > > > >> > > > > > > > can
> > > >> > > > > > > > > > >> > > > > > > > > be notified about topology
> > changes
> > > >> > > > > > > > > > >> > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > Partition awareness is a "lazy"
> > > >> > > notification
> > > >> > > > > in
> > > >> > > > > > a
> > > >> > > > > > > > form
> > > >> > > > > > > > > > of
> > > >> > > > > > > > > > >> a
> > > >> > > > > > > > > > >> > > > > response
> > > >> > > > > > > > > > >> > > > > > > > > message flag [2].
> > > >> > > > > > > > > > >> > > > > > > > > You won't get one on an idle
> > > >> connection.
> > > >> > > > > > > > > > >> > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > > the connections are removed
> on
> > > the
> > > >> > > server
> > > >> > > > > side
> > > >> > > > > > > by
> > > >> > > > > > > > > > client
> > > >> > > > > > > > > > >> > idle
> > > >> > > > > > > > > > >> > > > > > timeout
> > > >> > > > > > > > > > >> > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > Idle timeout is disabled by
> > > default.
> > > >> > > > > > > > > > >> > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > > is it OK to keep such
> > connections
> > > >> > alive
> > > >> > > > for
> > > >> > > > > a
> > > >> > > > > > > long
> > > >> > > > > > > > > > time
> > > >> > > > > > > > > > >> > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > I think it is up to the user.
> > > >> > > > > > > > > > >> > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > > in the case of partition
> > > awareness
> > > >> > > > features
> > > >> > > > > it
> > > >> > > > > > > can
> > > >> > > > > > > > > > lead
> > > >> > > > > > > > > > >> to
> > > >> > > > > > > > > > >> > > > > wasting
> > > >> > > > > > > > > > >> > > > > > > TCP
> > > >> > > > > > > > > > >> > > > > > > > > sockets on Ignite nodes, can't
> it
> > > >> > > > > > > > > > >> > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > Can you please elaborate?
> > > >> > > > > > > > > > >> > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > [1]
> > > >> > > > > > > > > > >> > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > >
> > > >> > > > > > > > > > >> > > > > > >
> > > >> > > > > > > > > > >> > > > > >
> > > >> > > > > > > > > > >> > > > >
> > > >> > > > > > > > > > >> > > >
> > > >> > > > > > > > > > >> > >
> > > >> > > > > > > > > > >> >
> > > >> > > > > > > > > > >>
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
> > > >> > > > > > > > > > >> > > > > > > > > [2]
> > > >> > > > > > > > > > >> > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > >
> > > >> > > > > > > > > > >> > > > > > >
> > > >> > > > > > > > > > >> > > > > >
> > > >> > > > > > > > > > >> > > > >
> > > >> > > > > > > > > > >> > > >
> > > >> > > > > > > > > > >> > >
> > > >> > > > > > > > > > >> >
> > > >> > > > > > > > > > >>
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
> > > >> > > > > > > > > > >> > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > On Fri, Feb 4, 2022 at 4:01 PM
> > > Maksim
> > > >> > > > Timonin
> > > >> > > > > <
> > > >> > > > > > > > > > >> > > > > > timoninmaxim@apache.org
> > > >> > > > > > > > > > >> > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > wrote:
> > > >> > > > > > > > > > >> > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > > Hi Pavel,
> > > >> > > > > > > > > > >> > > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > > Thanks for starting this
> > thread!
> > > >> Can I
> > > >> > > ask
> > > >> > > > > > some
> > > >> > > > > > > > > > >> questions
> > > >> > > > > > > > > > >> > > here
> > > >> > > > > > > > > > >> > > > to
> > > >> > > > > > > > > > >> > > > > > get
> > > >> > > > > > > > > > >> > > > > > > > the
> > > >> > > > > > > > > > >> > > > > > > > > > feature more clearly?
> > > >> > > > > > > > > > >> > > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > > As I understand it correctly,
> > > >> > half-state
> > > >> > > > is
> > > >> > > > > a
> > > >> > > > > > > > > possible
> > > >> > > > > > > > > > >> > > > situation
> > > >> > > > > > > > > > >> > > > > > when
> > > >> > > > > > > > > > >> > > > > > > > an
> > > >> > > > > > > > > > >> > > > > > > > > > Ignite node goes down or
> > somehow
> > > >> > removes
> > > >> > > > > > > > connection
> > > >> > > > > > > > > > to a
> > > >> > > > > > > > > > >> > thin
> > > >> > > > > > > > > > >> > > > > > client.
> > > >> > > > > > > > > > >> > > > > > > > But
> > > >> > > > > > > > > > >> > > > > > > > > > with enabled (true by
> default)
> > > >> > > > > > > partitionAwareness
> > > >> > > > > > > > > > >> feature
> > > >> > > > > > > > > > >> > > > clients
> > > >> > > > > > > > > > >> > > > > > can
> > > >> > > > > > > > > > >> > > > > > > > be
> > > >> > > > > > > > > > >> > > > > > > > > > notified about topology
> > changes.
> > > >> So,
> > > >> > > there
> > > >> > > > > are
> > > >> > > > > > > > > > possible
> > > >> > > > > > > > > > >> > > cases:
> > > >> > > > > > > > > > >> > > > > > > > > > 1. ThinClient connects to a
> > > single
> > > >> > node.
> > > >> > > > > > > > > > >> > > > > > > > > > 2. Ignite node removes
> > connection
> > > >> from
> > > >> > > > > itself.
> > > >> > > > > > > > > > >> > > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > > I like the idea for the case
> > > with a
> > > >> > > single
> > > >> > > > > > node,
> > > >> > > > > > > > as
> > > >> > > > > > > > > it
> > > >> > > > > > > > > > >> > helps
> > > >> > > > > > > > > > >> > > > fail
> > > >> > > > > > > > > > >> > > > > > > fast.
> > > >> > > > > > > > > > >> > > > > > > > > > But is it OK to connect a
> > client
> > > >> to a
> > > >> > > > single
> > > >> > > > > > > node
> > > >> > > > > > > > > > only?
> > > >> > > > > > > > > > >> > > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > > For the second one: you
> mention
> > > >> that a
> > > >> > > > case
> > > >> > > > > > for
> > > >> > > > > > > > the
> > > >> > > > > > > > > > >> second
> > > >> > > > > > > > > > >> > > > option
> > > >> > > > > > > > > > >> > > > > > is
> > > >> > > > > > > > > > >> > > > > > > > > > "Long-living and mostly idle
> > > >> > connections
> > > >> > > > are
> > > >> > > > > > > > > > especially
> > > >> > > > > > > > > > >> > > > > susceptible
> > > >> > > > > > > > > > >> > > > > > > to
> > > >> > > > > > > > > > >> > > > > > > > > this
> > > >> > > > > > > > > > >> > > > > > > > > > behavior". If I understand
> > > >> correctly
> > > >> > the
> > > >> > > > > > > > connections
> > > >> > > > > > > > > > are
> > > >> > > > > > > > > > >> > > > removed
> > > >> > > > > > > > > > >> > > > > on
> > > >> > > > > > > > > > >> > > > > > > the
> > > >> > > > > > > > > > >> > > > > > > > > > server side by client idle
> > > timeout.
> > > >> > Can
> > > >> > > we
> > > >> > > > > > just
> > > >> > > > > > > > > > >> configure
> > > >> > > > > > > > > > >> > the
> > > >> > > > > > > > > > >> > > > > idle
> > > >> > > > > > > > > > >> > > > > > > > > timeout
> > > >> > > > > > > > > > >> > > > > > > > > > for cases where we really
> need
> > > >> keeping
> > > >> > > > alive
> > > >> > > > > > > idle
> > > >> > > > > > > > > > >> > > connections?
> > > >> > > > > > > > > > >> > > > > Are
> > > >> > > > > > > > > > >> > > > > > > > there
> > > >> > > > > > > > > > >> > > > > > > > > > any other cases with
> > unexpectedly
> > > >> > > dropped
> > > >> > > > > > > > > connections?
> > > >> > > > > > > > > > >> > > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > > I'm wondering is it OK to
> keep
> > > such
> > > >> > > > > > connections
> > > >> > > > > > > > > alive
> > > >> > > > > > > > > > >> for a
> > > >> > > > > > > > > > >> > > > long
> > > >> > > > > > > > > > >> > > > > > > time?
> > > >> > > > > > > > > > >> > > > > > > > > > Also in the case of partition
> > > >> > awareness
> > > >> > > > > > features
> > > >> > > > > > > > it
> > > >> > > > > > > > > > can
> > > >> > > > > > > > > > >> > lead
> > > >> > > > > > > > > > >> > > to
> > > >> > > > > > > > > > >> > > > > > > wasting
> > > >> > > > > > > > > > >> > > > > > > > > TCP
> > > >> > > > > > > > > > >> > > > > > > > > > sockets on Ignite nodes,
> can't
> > > it?
> > > >> > > > > > > > > > >> > > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > > Thanks!
> > > >> > > > > > > > > > >> > > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > > On Thu, Feb 3, 2022 at 2:24
> PM
> > > >> Pavel
> > > >> > > > > Tupitsyn
> > > >> > > > > > <
> > > >> > > > > > > > > > >> > > > > > ptupitsyn@apache.org>
> > > >> > > > > > > > > > >> > > > > > > > > > wrote:
> > > >> > > > > > > > > > >> > > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > > >> Igniters,
> > > >> > > > > > > > > > >> > > > > > > > > >>
> > > >> > > > > > > > > > >> > > > > > > > > >> Please review the proposal
> to
> > > add
> > > >> > > > heartbeat
> > > >> > > > > > > > > messages
> > > >> > > > > > > > > > to
> > > >> > > > > > > > > > >> > the
> > > >> > > > > > > > > > >> > > > thin
> > > >> > > > > > > > > > >> > > > > > > > client
> > > >> > > > > > > > > > >> > > > > > > > > >> protocol (both 2.x and 3.x)
> > and
> > > >> let
> > > >> > me
> > > >> > > > know
> > > >> > > > > > > your
> > > >> > > > > > > > > > >> thoughts:
> > > >> > > > > > > > > > >> > > > > > > > > >>
> > > >> > > > > > > > > > >> > > > > > > > > >>
> > > >> > > > > > > > > > >> > > > > > > > > >>
> > > >> > > > > > > > > > >> > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > >
> > > >> > > > > > > > > > >> > > > > > >
> > > >> > > > > > > > > > >> > > > > >
> > > >> > > > > > > > > > >> > > > >
> > > >> > > > > > > > > > >> > > >
> > > >> > > > > > > > > > >> > >
> > > >> > > > > > > > > > >> >
> > > >> > > > > > > > > > >>
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
> > > >> > > > > > > > > > >> > > > > > > > > >>
> > > >> > > > > > > > > > >> > > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > >
> > > >> > > > > > > > > > >> > > > > > > > --
> > > >> > > > > > > > > > >> > > > > > > > Sincerely yours, Ivan Daschinskiy
> > > >> > > > > > > > > > >> > > > > > > >
> > > >> > > > > > > > > > >> > > > > > >
> > > >> > > > > > > > > > >> > > > > >
> > > >> > > > > > > > > > >> > > > > >
> > > >> > > > > > > > > > >> > > > > > --
> > > >> > > > > > > > > > >> > > > > > Sincerely yours, Ivan Daschinskiy
> > > >> > > > > > > > > > >> > > > > >
> > > >> > > > > > > > > > >> > > > >
> > > >> > > > > > > > > > >> > > >
> > > >> > > > > > > > > > >> > >
> > > >> > > > > > > > > > >> >
> > > >> > > > > > > > > > >>
> > > >> > > > > > > > > > >>
> > > >> > > > > > > > > > >> --
> > > >> > > > > > > > > > >> Sincerely yours, Ivan Daschinskiy
> > > >> > > > > > > > > > >>
> > > >> > > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > --
> > > >> > > > > > > Sincerely yours, Ivan Daschinskiy
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > --
> > > >> > > > > Sincerely yours, Ivan Daschinskiy
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> > >
> > > >> > > --
> > > >> > > Sincerely yours, Ivan Daschinskiy
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > > > --
> > > > Sincerely yours, Ivan Daschinskiy
> > > >
> > >
> >
>

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Maksim Timonin <ti...@apache.org>.
What about "keepAlive", "keepAliveInterval" then? It looks more common and
matches the IEP title :)

On Tue, Feb 15, 2022 at 5:54 PM Pavel Tupitsyn <pt...@apache.org> wrote:

> To summarize, we add two properties to the ClientConfiguration:
> bool heartbeatsEnabled = true;
> long defaultHeartbeatInterval = 60_000; // Default 1 minute, used
>
> Logic:
> if (heartbeatsEnabled) {
>   heartbeatInterval = serverIdleTimeout > 0 ? serverIdleTimeout / 3 :
> defaultHeartbeatInterval;
> }
>
>
> Thoughts, objections?
>
> On Tue, Feb 15, 2022 at 4:32 PM Ivan Daschinsky <iv...@gmail.com>
> wrote:
>
> > Pavel, sorry, i've made mistake. But current behaviour is ok for me. This
> > timeout cannot be change on server side runtime. But we can simplify
> > protocol just use one opcode and message
> >
> > вт, 15 февр. 2022 г., 14:54 Ivan Daschinsky <iv...@gmail.com>:
> >
> > > > Idle timeout can't change, why send it back with every heartbeat
> > > response?
> > > May be I am wrong, but from code I see this behaviour. But if I am
> wrong,
> > > this is ok behaviour for me.
> > >
> > >
> > >
> > > вт, 15 февр. 2022 г. в 14:00, Pavel Tupitsyn <pt...@apache.org>:
> > >
> > >> Ivan, I mostly agree with your proposal, except this point:
> > >>
> > >> > Response to heartbeat request -- is idle timeout
> > >> Idle timeout can't change, why send it back with every heartbeat
> > response?
> > >>
> > >> > possible cases with cluster restart, upgrade
> > >> In those cases, a new connection will be established, and we'll
> retrieve
> > >> the new timeout after the handshake.
> > >>
> > >>
> > >> On Tue, Feb 15, 2022 at 12:04 PM Maksim Timonin <
> > timoninmaxim@apache.org>
> > >> wrote:
> > >>
> > >> > Hi Ivan,
> > >> >
> > >> > Cases you described sound reasonable to me. Then the client should
> > just
> > >> set
> > >> > up the `keepAlive` flag, and it just works.
> > >> >
> > >> > So, there are 3 branches:
> > >> > 1. Users don't configure keepAlive at all.
> > >> > 2. Users configure keepAliveHeartbeatInterval (long, ms).
> > >> > 3. Users configure keepAlive (boolean).
> > >> >
> > >> > AFAIU, Pavel's proposal is about covering the second case only. But
> > >> > actually the 2nd and 3rd aren't conflicted with each other.I think
> for
> > >> both
> > >> > branches, a cluster should respond with idleTimeout value on every
> > keep
> > >> > alive client request. Because there are possible cases with cluster
> > >> > restart, upgrade, etc. Clients should check every response and in
> case
> > >> of
> > >> > changed idleTimeout. For 2nd case write a WARN message, and for 3rd
> -
> > >> > reconfigure themself in case of changed idleTimeout.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Tue, Feb 15, 2022 at 9:51 AM Ivan Daschinsky <
> ivandasch@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > Regarding discussion here [1]
> > >> > >
> > >> > > I suppose that this feature, despite the fact that initial
> intention
> > >> of
> > >> > > Pavel was different, can drastically
> > >> > > improve the usage pattern of thin clients and give a lot of
> > >> opportunities
> > >> > > if the following is done:
> > >> > >
> > >> > > 1. GridNioServer has a great feature -- idle timeout. If  a server
> > did
> > >> > not
> > >> > > receive any from a client -- it will be kicked off.
> > >> > >     But there are some scenarios that make the use of this feature
> > >> > > impossible:
> > >> > > a. Multiple workers waiting for batch tasks and relatively low
> > >> requests
> > >> > > rate -- this services will be often kicked off and must reconnect.
> > >> > > In order to prevent this behaviour, the user must implement a kind
> > of
> > >> > > heartbeating by himself.
> > >> > > b. Quite often user may want to implement leader-follower pattern
> > for
> > >> > > services for HA, so followers also will be considered as idle.
> > Kicking
> > >> > off
> > >> > > these followers
> > >> > > is not acceptable, so user  should also implement heartbeating by
> > >> > himself.
> > >> > >
> > >> > > My proposition is:
> > >> > > 1. Add two flags -- enable/disable heartbeats, and very optional
> > >> > heartbeat
> > >> > > timeout. Set enable to true by default, timeout to default
> heartbeat
> > >> > > timeout.
> > >> > > 2. If server and client both support this feature, and heartbeats
> > are
> > >> not
> > >> > > explicitly disabled on client side:
> > >> > > 3. Response to heartbeat request -- is idle timeout. If idle
> timeout
> > >> is
> > >> > set
> > >> > > on the server side , set heartbeat timeout to one-third of it,
> > instead
> > >> > set
> > >> > > to default or specified value.
> > >> > >
> > >> > > Pros:
> > >> > > 1. Easy to set up -- just flag on client side and just set timeout
> > on
> > >> > > server side.
> > >> > > 2. Hard to configure improperly, i.e set heartbeat timeout not
> short
> > >> > enough
> > >> > > in order to prevent kicking out by server.
> > >> > > 3. If the user just wants heartbeats without setting idle timeout
> --
> > >> > > heartbeats are by default on and with reasonable timeout.
> > >> > >
> > >> > > Cons:
> > >> > > 1. If someone will rely on old behavior and just wants to drop his
> > >> > clients
> > >> > > on timeout -- this will not work without reconfiguring, he should
> > >> disable
> > >> > > heartbeats.
> > >> > > But I cannot even imagine that someone will find this behaviour
> > >> > desirable.
> > >> > > I strongly believe that this behaviour prevents users from using
> > >> > > idleTimeout on server side.
> > >> > >
> > >> > > [1] --
> > >> https://github.com/apache/ignite/pull/9817#discussion_r805628955
> > >> > >
> > >> > > пт, 11 февр. 2022 г. в 10:58, Pavel Tupitsyn <
> ptupitsyn@apache.org
> > >:
> > >> > >
> > >> > > > I've prepared a PR, please have a look:
> > >> > > > https://github.com/apache/ignite/pull/9817
> > >> > > >
> > >> > > > On Mon, Feb 7, 2022 at 6:37 PM Ivan Daschinsky <
> > ivandasch@gmail.com
> > >> >
> > >> > > > wrote:
> > >> > > >
> > >> > > > > I see potential in this feature, especially if we use
> something
> > >> like
> > >> > > > > continuous query. Stale clients can consume a lot of resources
> > >> and it
> > >> > > is
> > >> > > > > worth kick these clients out.
> > >> > > > >
> > >> > > > > пн, 7 февр. 2022 г. в 18:25, Pavel Tupitsyn <
> > ptupitsyn@apache.org
> > >> >:
> > >> > > > >
> > >> > > > > > > If we use new approach, we can reduce this timeout. But
> this
> > >> can
> > >> > > > affect
> > >> > > > > > old clients.
> > >> > > > > >
> > >> > > > > > idleTimeout is disabled by default, we are not going to
> change
> > >> > this.
> > >> > > > > >
> > >> > > > > > > Also, let's think about that sending heartbeats and
> interval
> > >> of
> > >> > > > sending
> > >> > > > > > > heartbeats could be calculated on the server side (i.e.
> one
> > >> third
> > >> > > of
> > >> > > > > idle
> > >> > > > > > > timeout) and sent to the client during handshake.
> > >> > > > > > > Also we can introduce something like a negotiation
> mechanism
> > >> as
> > >> > in
> > >> > > > > > > zookeeper.
> > >> > > > > >
> > >> > > > > > I tend to agree with Maksim here, let's keep it simple and
> > >> > explicit.
> > >> > > > > > Log a warning, but don't do anything clever.
> > >> > > > > >
> > >> > > > > > On Mon, Feb 7, 2022 at 6:15 PM Ivan Daschinsky <
> > >> > ivandasch@gmail.com>
> > >> > > > > > wrote:
> > >> > > > > >
> > >> > > > > > > >> idleTimeout already exists, I don't think we should
> > change
> > >> the
> > >> > > way
> > >> > > > > it
> > >> > > > > > > works (or did I misunderstand you?)
> > >> > > > > > > If we use new approach, we can reduce this timeout. But
> this
> > >> can
> > >> > > > affect
> > >> > > > > > old
> > >> > > > > > > clients.
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > Also, let's think about that sending heartbeats and
> interval
> > >> of
> > >> > > > sending
> > >> > > > > > > heartbeats could be calculated on the server side (i.e.
> one
> > >> third
> > >> > > of
> > >> > > > > idle
> > >> > > > > > > timeout) and sent to the client
> > >> > > > > > > during handshake.
> > >> > > > > > > Also we can introduce something like a negotiation
> mechanism
> > >> as
> > >> > in
> > >> > > > > > > zookeeper.
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > пн, 7 февр. 2022 г. в 18:05, Pavel Tupitsyn <
> > >> > ptupitsyn@apache.org
> > >> > > >:
> > >> > > > > > >
> > >> > > > > > > > Igor,
> > >> > > > > > > >
> > >> > > > > > > > > Maybe clients should pass this information on to the
> > >> > handshake.
> > >> > > > > > > >
> > >> > > > > > > > Do you think we should log a mismatched timeout warning
> on
> > >> the
> > >> > > > > server,
> > >> > > > > > > not
> > >> > > > > > > > on the client?
> > >> > > > > > > > Or should we do both?
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > I've updated the proposal with OP_GET_IDLE_TIMEOUT and
> > some
> > >> > other
> > >> > > > > > details
> > >> > > > > > > > discussed above.
> > >> > > > > > > >
> > >> > > > > > > > On Mon, Feb 7, 2022 at 5:42 PM Igor Sapego <
> > >> isapego@apache.org
> > >> > >
> > >> > > > > wrote:
> > >> > > > > > > >
> > >> > > > > > > > > Feature seems useful for me as it makes connection
> > >> management
> > >> > > > more
> > >> > > > > > > robust
> > >> > > > > > > > > and
> > >> > > > > > > > > predictable.
> > >> > > > > > > > >
> > >> > > > > > > > > I agree with Pavel, that we should print warning when
> > >> > heartbeat
> > >> > > > > > period
> > >> > > > > > > is
> > >> > > > > > > > > larger than
> > >> > > > > > > > > idle timeout, but I see a problem here as idle timeout
> > is
> > >> > > > > configured
> > >> > > > > > on
> > >> > > > > > > > > server and is not
> > >> > > > > > > > > known to clients, while heartbeats configured on
> clients
> > >> and
> > >> > > > their
> > >> > > > > > > period
> > >> > > > > > > > > is not known
> > >> > > > > > > > > to the server. Maybe clients should pass this
> > information
> > >> on
> > >> > to
> > >> > > > the
> > >> > > > > > > > > handshake.
> > >> > > > > > > > >
> > >> > > > > > > > > Regarding Python and PHP clients - can not we use some
> > >> kind
> > >> > of
> > >> > > > > timers
> > >> > > > > > > to
> > >> > > > > > > > > implement
> > >> > > > > > > > > this feature?
> > >> > > > > > > > >
> > >> > > > > > > > > Best Regards,
> > >> > > > > > > > > Igor
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <
> > >> > > > > ptupitsyn@apache.org>
> > >> > > > > > > > > wrote:
> > >> > > > > > > > >
> > >> > > > > > > > > > Maksim, agree. Let's not be too clever and only log
> a
> > >> > > warning.
> > >> > > > > > > > > >
> > >> > > > > > > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <
> > >> > > > > > ptupitsyn@apache.org>
> > >> > > > > > > > > > wrote:
> > >> > > > > > > > > >
> > >> > > > > > > > > > > Ivan, idleTimeout already exists, I don't think we
> > >> should
> > >> > > > > change
> > >> > > > > > > the
> > >> > > > > > > > > way
> > >> > > > > > > > > > > it works (or did I misunderstand you?)
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Of course, enabling heartbeats means that
> otherwise
> > >> idle
> > >> > > > > clients
> > >> > > > > > > will
> > >> > > > > > > > > no
> > >> > > > > > > > > > > longer be disconnected by the server.
> > >> > > > > > > > > > > I think we should cross-link those properties in
> the
> > >> > > > > > documentation
> > >> > > > > > > > and
> > >> > > > > > > > > > > explain this behavior.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > On Mon, Feb 7, 2022 at 4:39 PM Ivan Daschinsky <
> > >> > > > > > > ivandasch@gmail.com>
> > >> > > > > > > > > > > wrote:
> > >> > > > > > > > > > >
> > >> > > > > > > > > > >> >>3. Already implemented: when
> > >> > > > > > > > > ClientConnectorConfiguration#idleTimeout
> > >> > > > > > > > > > is
> > >> > > > > > > > > > >> not zero, server disconnects idle clients
> > >> > > > > > > > > > >> >>
> > >> > > > > > > > > > >> But I suppose it would be great to have:
> > >> > > > > > > > > > >> 1. If client supports keep alive, use idleTimeout
> > >> > > > > > > > > > >> 2. If not, do not use it.
> > >> > > > > > > > > > >>
> > >> > > > > > > > > > >> But I am not sure if it is correct or not.
> > >> > > > > > > > > > >>
> > >> > > > > > > > > > >> пн, 7 февр. 2022 г. в 16:01, Maksim Timonin <
> > >> > > > > > > > timoninmaxim@apache.org
> > >> > > > > > > > > >:
> > >> > > > > > > > > > >>
> > >> > > > > > > > > > >> > I believe explicit is better than implicit :)
> > Also
> > >> in
> > >> > > case
> > >> > > > > of
> > >> > > > > > > > > dynamic
> > >> > > > > > > > > > >> > calculation of timeout, it can change
> > dynamically,
> > >> for
> > >> > > > > example
> > >> > > > > > > > > > >> restarting a
> > >> > > > > > > > > > >> > cluster with different configuration should
> > >> > reconfigure
> > >> > > > > > clients
> > >> > > > > > > > too.
> > >> > > > > > > > > > >> Looks
> > >> > > > > > > > > > >> > complicated.
> > >> > > > > > > > > > >> >
> > >> > > > > > > > > > >> > My vote for WARN + javadocs with mention of
> this
> > >> > issue.
> > >> > > > > > > > > > >> >
> > >> > > > > > > > > > >> > On Mon, Feb 7, 2022 at 3:51 PM Pavel Tupitsyn <
> > >> > > > > > > > ptupitsyn@apache.org
> > >> > > > > > > > > >
> > >> > > > > > > > > > >> > wrote:
> > >> > > > > > > > > > >> >
> > >> > > > > > > > > > >> > > > WDYT, should we add a WARN message for
> > clients
> > >> > that
> > >> > > > > > > configure
> > >> > > > > > > > > > >> > > > keepAliveTimeout greater than idleTimeout
> on
> > >> the
> > >> > > > server
> > >> > > > > > > side?
> > >> > > > > > > > > > >> > >
> > >> > > > > > > > > > >> > > I think we should either log a WARN, or
> > retrieve
> > >> > > > > idleTimeout
> > >> > > > > > > > from
> > >> > > > > > > > > > >> server
> > >> > > > > > > > > > >> > > and configure heartbeatTimeout accordingly
> > (e.g.
> > >> > > divide
> > >> > > > by
> > >> > > > > > 2).
> > >> > > > > > > > > > >> > > Thoughts?
> > >> > > > > > > > > > >> > >
> > >> > > > > > > > > > >> > > On Mon, Feb 7, 2022 at 3:14 PM Maksim
> Timonin <
> > >> > > > > > > > > > >> timoninmaxim@apache.org>
> > >> > > > > > > > > > >> > > wrote:
> > >> > > > > > > > > > >> > >
> > >> > > > > > > > > > >> > > > Hi Pavel,
> > >> > > > > > > > > > >> > > >
> > >> > > > > > > > > > >> > > > Thanks for the links. Yes, I forgot that
> the
> > >> flag
> > >> > of
> > >> > > > > > changed
> > >> > > > > > > > > > >> topology
> > >> > > > > > > > > > >> > is
> > >> > > > > > > > > > >> > > > lazy. Also I missed that the keepAlive
> > setting
> > >> is
> > >> > > > > > configured
> > >> > > > > > > > on
> > >> > > > > > > > > > the
> > >> > > > > > > > > > >> > > client
> > >> > > > > > > > > > >> > > > side (alternatively to idleTimeout that is
> on
> > >> the
> > >> > > > server
> > >> > > > > > > > side).
> > >> > > > > > > > > > >> > > >
> > >> > > > > > > > > > >> > > > Now I understand, this feature can be
> helpful
> > >> > then.
> > >> > > > > Every
> > >> > > > > > > > client
> > >> > > > > > > > > > can
> > >> > > > > > > > > > >> > > > configure itself in case it's possible to
> be
> > >> idle
> > >> > > > > > sometimes,
> > >> > > > > > > > and
> > >> > > > > > > > > > >> choose
> > >> > > > > > > > > > >> > > > an appropriate timeout by itself too. And
> by
> > >> > default
> > >> > > > the
> > >> > > > > > > > feature
> > >> > > > > > > > > > >> should
> > >> > > > > > > > > > >> > > be
> > >> > > > > > > > > > >> > > > disabled.
> > >> > > > > > > > > > >> > > >
> > >> > > > > > > > > > >> > > > WDYT, should we add a WARN message for
> > clients
> > >> > that
> > >> > > > > > > configure
> > >> > > > > > > > > > >> > > > keepAliveTimeout greater than idleTimeout
> on
> > >> the
> > >> > > > server
> > >> > > > > > > side?
> > >> > > > > > > > > > >> > > >
> > >> > > > > > > > > > >> > > >
> > >> > > > > > > > > > >> > > >
> > >> > > > > > > > > > >> > > > On Mon, Feb 7, 2022 at 1:05 PM Pavel
> > Tupitsyn <
> > >> > > > > > > > > > ptupitsyn@apache.org
> > >> > > > > > > > > > >> >
> > >> > > > > > > > > > >> > > > wrote:
> > >> > > > > > > > > > >> > > >
> > >> > > > > > > > > > >> > > > > Ivan,
> > >> > > > > > > > > > >> > > > >
> > >> > > > > > > > > > >> > > > > I suggest the following:
> > >> > > > > > > > > > >> > > > >
> > >> > > > > > > > > > >> > > > > 1. Server sends KEEP_ALIVE feature flag,
> > >> which
> > >> > > means
> > >> > > > > it
> > >> > > > > > > > > accepts
> > >> > > > > > > > > > >> > > > > OP_KEEP_ALIVE empty message
> > >> > > > > > > > > > >> > > > > 2. Client sends OP_KEEP_ALIVE when the
> > >> > connection
> > >> > > is
> > >> > > > > > idle
> > >> > > > > > > > for
> > >> > > > > > > > > a
> > >> > > > > > > > > > >> > > > > certain period of time
> > >> > > > > > > > > > >> > > > > 3. Already implemented: when
> > >> > > > > > > > > > >> ClientConnectorConfiguration#idleTimeout
> > >> > > > > > > > > > >> > > is
> > >> > > > > > > > > > >> > > > > not zero, server disconnects idle clients
> > >> > > > > > > > > > >> > > > >
> > >> > > > > > > > > > >> > > > > This way we don't need server->client
> > >> > keepalives,
> > >> > > as
> > >> > > > > you
> > >> > > > > > > > > > correctly
> > >> > > > > > > > > > >> > > noted.
> > >> > > > > > > > > > >> > > > >
> > >> > > > > > > > > > >> > > > > On Mon, Feb 7, 2022 at 12:43 PM Ivan
> > >> Daschinsky
> > >> > <
> > >> > > > > > > > > > >> ivandasch@gmail.com
> > >> > > > > > > > > > >> > >
> > >> > > > > > > > > > >> > > > > wrote:
> > >> > > > > > > > > > >> > > > >
> > >> > > > > > > > > > >> > > > > > Pavel, I suppose that ideally:
> > >> > > > > > > > > > >> > > > > > 1. Client send in handshake flag, that
> it
> > >> > > supports
> > >> > > > > > > > > KEEP_ALIVE
> > >> > > > > > > > > > >> > feature
> > >> > > > > > > > > > >> > > > and
> > >> > > > > > > > > > >> > > > > > server takes it into account.
> > >> > > > > > > > > > >> > > > > > 2. Each request of client can be
> > >> considered as
> > >> > > > > > > keep-alive
> > >> > > > > > > > > > ping.
> > >> > > > > > > > > > >> > > > > > 3. Client send failure should be
> > processed
> > >> > using
> > >> > > > > retry
> > >> > > > > > > > > policy.
> > >> > > > > > > > > > >> > > > > > 4. Server should not send keep-alive
> > >> packets,
> > >> > it
> > >> > > > is
> > >> > > > > > > > > redundant,
> > >> > > > > > > > > > >> but
> > >> > > > > > > > > > >> > > > server
> > >> > > > > > > > > > >> > > > > > should track requests from client and
> if
> > >> there
> > >> > > is
> > >> > > > no
> > >> > > > > > > > > requests
> > >> > > > > > > > > > >> from
> > >> > > > > > > > > > >> > > > client
> > >> > > > > > > > > > >> > > > > > with KEEP_ALIVE feature,
> > >> > > > > > > > > > >> > > > > > automatically close connection and free
> > >> > > resources.
> > >> > > > > > > > > > >> > > > > >
> > >> > > > > > > > > > >> > > > > > Similar approach is used in zookeeper
> > >> clients.
> > >> > > > > > > > > > >> > > > > >
> > >> > > > > > > > > > >> > > > > > пн, 7 февр. 2022 г. в 12:24, Pavel
> > >> Tupitsyn <
> > >> > > > > > > > > > >> ptupitsyn@apache.org
> > >> > > > > > > > > > >> > >:
> > >> > > > > > > > > > >> > > > > >
> > >> > > > > > > > > > >> > > > > > > Ivan,
> > >> > > > > > > > > > >> > > > > > >
> > >> > > > > > > > > > >> > > > > > > Ideally, the check should come from
> > both
> > >> > > sides.
> > >> > > > > > > > > > >> > > > > > > - Client periodically sends keepalive
> > to
> > >> > > server
> > >> > > > > > > > > > >> > > > > > > - Server periodically sends keepalive
> > to
> > >> > > client
> > >> > > > > > > > > > >> > > > > > >
> > >> > > > > > > > > > >> > > > > > > Feature flags will be added
> > accordingly,
> > >> so
> > >> > it
> > >> > > > is
> > >> > > > > > not
> > >> > > > > > > > > > >> necessary
> > >> > > > > > > > > > >> > to
> > >> > > > > > > > > > >> > > > > > > implement this in all thin clients.
> > >> > > > > > > > > > >> > > > > > >
> > >> > > > > > > > > > >> > > > > > > On Mon, Feb 7, 2022 at 11:43 AM Ivan
> > >> > > Daschinsky
> > >> > > > <
> > >> > > > > > > > > > >> > > ivandasch@gmail.com
> > >> > > > > > > > > > >> > > > >
> > >> > > > > > > > > > >> > > > > > > wrote:
> > >> > > > > > > > > > >> > > > > > >
> > >> > > > > > > > > > >> > > > > > > > I suppose it is great idea, but
> this
> > >> > > > > functionality
> > >> > > > > > > can
> > >> > > > > > > > > be
> > >> > > > > > > > > > >> hard
> > >> > > > > > > > > > >> > to
> > >> > > > > > > > > > >> > > > > > > implement
> > >> > > > > > > > > > >> > > > > > > > for some platforms. I.e. sync
> python
> > >> > client
> > >> > > or
> > >> > > > > php
> > >> > > > > > > > > (there
> > >> > > > > > > > > > >> is no
> > >> > > > > > > > > > >> > > > real
> > >> > > > > > > > > > >> > > > > > > > multithreading for python (GIL) and
> > >> php is
> > >> > > > > single
> > >> > > > > > > > > threaded
> > >> > > > > > > > > > >> by
> > >> > > > > > > > > > >> > > > > design).
> > >> > > > > > > > > > >> > > > > > > But
> > >> > > > > > > > > > >> > > > > > > > for async clients it is not very
> hard
> > >> to
> > >> > > > > > implement.
> > >> > > > > > > > > > >> > Nevertheless,
> > >> > > > > > > > > > >> > > > > this
> > >> > > > > > > > > > >> > > > > > > > feature should be optional, because
> > of
> > >> > > > possible
> > >> > > > > > > > > technical
> > >> > > > > > > > > > >> > > > > limitations.
> > >> > > > > > > > > > >> > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > Pavel, is this check mostly for
> > client
> > >> > side?
> > >> > > > Or
> > >> > > > > > > > servers
> > >> > > > > > > > > > can
> > >> > > > > > > > > > >> do
> > >> > > > > > > > > > >> > > some
> > >> > > > > > > > > > >> > > > > > > actions
> > >> > > > > > > > > > >> > > > > > > > if there is no activity from thin
> > >> client
> > >> > > (i.e.
> > >> > > > > > > closing
> > >> > > > > > > > > > >> context
> > >> > > > > > > > > > >> > > and
> > >> > > > > > > > > > >> > > > > free
> > >> > > > > > > > > > >> > > > > > > > resources such as queries' handles
> > and
> > >> so
> > >> > > on?)
> > >> > > > > > > > > > >> > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > пн, 7 февр. 2022 г. в 11:09, Pavel
> > >> > Tupitsyn
> > >> > > <
> > >> > > > > > > > > > >> > > ptupitsyn@apache.org
> > >> > > > > > > > > > >> > > > >:
> > >> > > > > > > > > > >> > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > Hi Maksim,
> > >> > > > > > > > > > >> > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > > half-state is a possible
> > situation
> > >> > when
> > >> > > an
> > >> > > > > > > Ignite
> > >> > > > > > > > > node
> > >> > > > > > > > > > >> goes
> > >> > > > > > > > > > >> > > > down
> > >> > > > > > > > > > >> > > > > or
> > >> > > > > > > > > > >> > > > > > > > > somehow removes connection to a
> > thin
> > >> > > client
> > >> > > > > > > > > > >> > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > Half-open state is also possible
> > >> when,
> > >> > for
> > >> > > > > > > example,
> > >> > > > > > > > an
> > >> > > > > > > > > > >> > > > intermediate
> > >> > > > > > > > > > >> > > > > > > > router
> > >> > > > > > > > > > >> > > > > > > > > is rebooted [1].
> > >> > > > > > > > > > >> > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > This is what we seem to have
> > >> encountered
> > >> > > > with
> > >> > > > > > one
> > >> > > > > > > of
> > >> > > > > > > > > our
> > >> > > > > > > > > > >> > > > customers
> > >> > > > > > > > > > >> > > > > -
> > >> > > > > > > > > > >> > > > > > > they
> > >> > > > > > > > > > >> > > > > > > > > have a stable cluster, and
> > >> long-living
> > >> > > > > (multiple
> > >> > > > > > > > days)
> > >> > > > > > > > > > >> thin
> > >> > > > > > > > > > >> > > > client
> > >> > > > > > > > > > >> > > > > > > > > connections which can be idle for
> > >> some
> > >> > > time.
> > >> > > > > > > > > > >> > > > > > > > > And only when we send some data
> on
> > >> such
> > >> > an
> > >> > > > > idle
> > >> > > > > > > > > > >> connection do
> > >> > > > > > > > > > >> > > we
> > >> > > > > > > > > > >> > > > > > > discover
> > >> > > > > > > > > > >> > > > > > > > > that it is broken.
> > >> > > > > > > > > > >> > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > > But with enabled (true by
> > default)
> > >> > > > > > > > > partitionAwareness
> > >> > > > > > > > > > >> > feature
> > >> > > > > > > > > > >> > > > > > clients
> > >> > > > > > > > > > >> > > > > > > > can
> > >> > > > > > > > > > >> > > > > > > > > be notified about topology
> changes
> > >> > > > > > > > > > >> > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > Partition awareness is a "lazy"
> > >> > > notification
> > >> > > > > in
> > >> > > > > > a
> > >> > > > > > > > form
> > >> > > > > > > > > > of
> > >> > > > > > > > > > >> a
> > >> > > > > > > > > > >> > > > > response
> > >> > > > > > > > > > >> > > > > > > > > message flag [2].
> > >> > > > > > > > > > >> > > > > > > > > You won't get one on an idle
> > >> connection.
> > >> > > > > > > > > > >> > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > > the connections are removed on
> > the
> > >> > > server
> > >> > > > > side
> > >> > > > > > > by
> > >> > > > > > > > > > client
> > >> > > > > > > > > > >> > idle
> > >> > > > > > > > > > >> > > > > > timeout
> > >> > > > > > > > > > >> > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > Idle timeout is disabled by
> > default.
> > >> > > > > > > > > > >> > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > > is it OK to keep such
> connections
> > >> > alive
> > >> > > > for
> > >> > > > > a
> > >> > > > > > > long
> > >> > > > > > > > > > time
> > >> > > > > > > > > > >> > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > I think it is up to the user.
> > >> > > > > > > > > > >> > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > > in the case of partition
> > awareness
> > >> > > > features
> > >> > > > > it
> > >> > > > > > > can
> > >> > > > > > > > > > lead
> > >> > > > > > > > > > >> to
> > >> > > > > > > > > > >> > > > > wasting
> > >> > > > > > > > > > >> > > > > > > TCP
> > >> > > > > > > > > > >> > > > > > > > > sockets on Ignite nodes, can't it
> > >> > > > > > > > > > >> > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > Can you please elaborate?
> > >> > > > > > > > > > >> > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > [1]
> > >> > > > > > > > > > >> > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > >
> > >> > > > > > > > > > >> > > > > > >
> > >> > > > > > > > > > >> > > > > >
> > >> > > > > > > > > > >> > > > >
> > >> > > > > > > > > > >> > > >
> > >> > > > > > > > > > >> > >
> > >> > > > > > > > > > >> >
> > >> > > > > > > > > > >>
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
> > >> > > > > > > > > > >> > > > > > > > > [2]
> > >> > > > > > > > > > >> > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > >
> > >> > > > > > > > > > >> > > > > > >
> > >> > > > > > > > > > >> > > > > >
> > >> > > > > > > > > > >> > > > >
> > >> > > > > > > > > > >> > > >
> > >> > > > > > > > > > >> > >
> > >> > > > > > > > > > >> >
> > >> > > > > > > > > > >>
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
> > >> > > > > > > > > > >> > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > On Fri, Feb 4, 2022 at 4:01 PM
> > Maksim
> > >> > > > Timonin
> > >> > > > > <
> > >> > > > > > > > > > >> > > > > > timoninmaxim@apache.org
> > >> > > > > > > > > > >> > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > wrote:
> > >> > > > > > > > > > >> > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > > Hi Pavel,
> > >> > > > > > > > > > >> > > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > > Thanks for starting this
> thread!
> > >> Can I
> > >> > > ask
> > >> > > > > > some
> > >> > > > > > > > > > >> questions
> > >> > > > > > > > > > >> > > here
> > >> > > > > > > > > > >> > > > to
> > >> > > > > > > > > > >> > > > > > get
> > >> > > > > > > > > > >> > > > > > > > the
> > >> > > > > > > > > > >> > > > > > > > > > feature more clearly?
> > >> > > > > > > > > > >> > > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > > As I understand it correctly,
> > >> > half-state
> > >> > > > is
> > >> > > > > a
> > >> > > > > > > > > possible
> > >> > > > > > > > > > >> > > > situation
> > >> > > > > > > > > > >> > > > > > when
> > >> > > > > > > > > > >> > > > > > > > an
> > >> > > > > > > > > > >> > > > > > > > > > Ignite node goes down or
> somehow
> > >> > removes
> > >> > > > > > > > connection
> > >> > > > > > > > > > to a
> > >> > > > > > > > > > >> > thin
> > >> > > > > > > > > > >> > > > > > client.
> > >> > > > > > > > > > >> > > > > > > > But
> > >> > > > > > > > > > >> > > > > > > > > > with enabled (true by default)
> > >> > > > > > > partitionAwareness
> > >> > > > > > > > > > >> feature
> > >> > > > > > > > > > >> > > > clients
> > >> > > > > > > > > > >> > > > > > can
> > >> > > > > > > > > > >> > > > > > > > be
> > >> > > > > > > > > > >> > > > > > > > > > notified about topology
> changes.
> > >> So,
> > >> > > there
> > >> > > > > are
> > >> > > > > > > > > > possible
> > >> > > > > > > > > > >> > > cases:
> > >> > > > > > > > > > >> > > > > > > > > > 1. ThinClient connects to a
> > single
> > >> > node.
> > >> > > > > > > > > > >> > > > > > > > > > 2. Ignite node removes
> connection
> > >> from
> > >> > > > > itself.
> > >> > > > > > > > > > >> > > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > > I like the idea for the case
> > with a
> > >> > > single
> > >> > > > > > node,
> > >> > > > > > > > as
> > >> > > > > > > > > it
> > >> > > > > > > > > > >> > helps
> > >> > > > > > > > > > >> > > > fail
> > >> > > > > > > > > > >> > > > > > > fast.
> > >> > > > > > > > > > >> > > > > > > > > > But is it OK to connect a
> client
> > >> to a
> > >> > > > single
> > >> > > > > > > node
> > >> > > > > > > > > > only?
> > >> > > > > > > > > > >> > > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > > For the second one: you mention
> > >> that a
> > >> > > > case
> > >> > > > > > for
> > >> > > > > > > > the
> > >> > > > > > > > > > >> second
> > >> > > > > > > > > > >> > > > option
> > >> > > > > > > > > > >> > > > > > is
> > >> > > > > > > > > > >> > > > > > > > > > "Long-living and mostly idle
> > >> > connections
> > >> > > > are
> > >> > > > > > > > > > especially
> > >> > > > > > > > > > >> > > > > susceptible
> > >> > > > > > > > > > >> > > > > > > to
> > >> > > > > > > > > > >> > > > > > > > > this
> > >> > > > > > > > > > >> > > > > > > > > > behavior". If I understand
> > >> correctly
> > >> > the
> > >> > > > > > > > connections
> > >> > > > > > > > > > are
> > >> > > > > > > > > > >> > > > removed
> > >> > > > > > > > > > >> > > > > on
> > >> > > > > > > > > > >> > > > > > > the
> > >> > > > > > > > > > >> > > > > > > > > > server side by client idle
> > timeout.
> > >> > Can
> > >> > > we
> > >> > > > > > just
> > >> > > > > > > > > > >> configure
> > >> > > > > > > > > > >> > the
> > >> > > > > > > > > > >> > > > > idle
> > >> > > > > > > > > > >> > > > > > > > > timeout
> > >> > > > > > > > > > >> > > > > > > > > > for cases where we really need
> > >> keeping
> > >> > > > alive
> > >> > > > > > > idle
> > >> > > > > > > > > > >> > > connections?
> > >> > > > > > > > > > >> > > > > Are
> > >> > > > > > > > > > >> > > > > > > > there
> > >> > > > > > > > > > >> > > > > > > > > > any other cases with
> unexpectedly
> > >> > > dropped
> > >> > > > > > > > > connections?
> > >> > > > > > > > > > >> > > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > > I'm wondering is it OK to keep
> > such
> > >> > > > > > connections
> > >> > > > > > > > > alive
> > >> > > > > > > > > > >> for a
> > >> > > > > > > > > > >> > > > long
> > >> > > > > > > > > > >> > > > > > > time?
> > >> > > > > > > > > > >> > > > > > > > > > Also in the case of partition
> > >> > awareness
> > >> > > > > > features
> > >> > > > > > > > it
> > >> > > > > > > > > > can
> > >> > > > > > > > > > >> > lead
> > >> > > > > > > > > > >> > > to
> > >> > > > > > > > > > >> > > > > > > wasting
> > >> > > > > > > > > > >> > > > > > > > > TCP
> > >> > > > > > > > > > >> > > > > > > > > > sockets on Ignite nodes, can't
> > it?
> > >> > > > > > > > > > >> > > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > > Thanks!
> > >> > > > > > > > > > >> > > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > > On Thu, Feb 3, 2022 at 2:24 PM
> > >> Pavel
> > >> > > > > Tupitsyn
> > >> > > > > > <
> > >> > > > > > > > > > >> > > > > > ptupitsyn@apache.org>
> > >> > > > > > > > > > >> > > > > > > > > > wrote:
> > >> > > > > > > > > > >> > > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > > >> Igniters,
> > >> > > > > > > > > > >> > > > > > > > > >>
> > >> > > > > > > > > > >> > > > > > > > > >> Please review the proposal to
> > add
> > >> > > > heartbeat
> > >> > > > > > > > > messages
> > >> > > > > > > > > > to
> > >> > > > > > > > > > >> > the
> > >> > > > > > > > > > >> > > > thin
> > >> > > > > > > > > > >> > > > > > > > client
> > >> > > > > > > > > > >> > > > > > > > > >> protocol (both 2.x and 3.x)
> and
> > >> let
> > >> > me
> > >> > > > know
> > >> > > > > > > your
> > >> > > > > > > > > > >> thoughts:
> > >> > > > > > > > > > >> > > > > > > > > >>
> > >> > > > > > > > > > >> > > > > > > > > >>
> > >> > > > > > > > > > >> > > > > > > > > >>
> > >> > > > > > > > > > >> > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > >
> > >> > > > > > > > > > >> > > > > > >
> > >> > > > > > > > > > >> > > > > >
> > >> > > > > > > > > > >> > > > >
> > >> > > > > > > > > > >> > > >
> > >> > > > > > > > > > >> > >
> > >> > > > > > > > > > >> >
> > >> > > > > > > > > > >>
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
> > >> > > > > > > > > > >> > > > > > > > > >>
> > >> > > > > > > > > > >> > > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > >
> > >> > > > > > > > > > >> > > > > > > >
> > >> > > > > > > > > > >> > > > > > > >
> > >> > > > > > > > > > >> > > > > > > > --
> > >> > > > > > > > > > >> > > > > > > > Sincerely yours, Ivan Daschinskiy
> > >> > > > > > > > > > >> > > > > > > >
> > >> > > > > > > > > > >> > > > > > >
> > >> > > > > > > > > > >> > > > > >
> > >> > > > > > > > > > >> > > > > >
> > >> > > > > > > > > > >> > > > > > --
> > >> > > > > > > > > > >> > > > > > Sincerely yours, Ivan Daschinskiy
> > >> > > > > > > > > > >> > > > > >
> > >> > > > > > > > > > >> > > > >
> > >> > > > > > > > > > >> > > >
> > >> > > > > > > > > > >> > >
> > >> > > > > > > > > > >> >
> > >> > > > > > > > > > >>
> > >> > > > > > > > > > >>
> > >> > > > > > > > > > >> --
> > >> > > > > > > > > > >> Sincerely yours, Ivan Daschinskiy
> > >> > > > > > > > > > >>
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > --
> > >> > > > > > > Sincerely yours, Ivan Daschinskiy
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > --
> > >> > > > > Sincerely yours, Ivan Daschinskiy
> > >> > > > >
> > >> > > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Sincerely yours, Ivan Daschinskiy
> > >> > >
> > >> >
> > >>
> > >
> > >
> > > --
> > > Sincerely yours, Ivan Daschinskiy
> > >
> >
>

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Pavel Tupitsyn <pt...@apache.org>.
To summarize, we add two properties to the ClientConfiguration:
bool heartbeatsEnabled = true;
long defaultHeartbeatInterval = 60_000; // Default 1 minute, used

Logic:
if (heartbeatsEnabled) {
  heartbeatInterval = serverIdleTimeout > 0 ? serverIdleTimeout / 3 :
defaultHeartbeatInterval;
}


Thoughts, objections?

On Tue, Feb 15, 2022 at 4:32 PM Ivan Daschinsky <iv...@gmail.com> wrote:

> Pavel, sorry, i've made mistake. But current behaviour is ok for me. This
> timeout cannot be change on server side runtime. But we can simplify
> protocol just use one opcode and message
>
> вт, 15 февр. 2022 г., 14:54 Ivan Daschinsky <iv...@gmail.com>:
>
> > > Idle timeout can't change, why send it back with every heartbeat
> > response?
> > May be I am wrong, but from code I see this behaviour. But if I am wrong,
> > this is ok behaviour for me.
> >
> >
> >
> > вт, 15 февр. 2022 г. в 14:00, Pavel Tupitsyn <pt...@apache.org>:
> >
> >> Ivan, I mostly agree with your proposal, except this point:
> >>
> >> > Response to heartbeat request -- is idle timeout
> >> Idle timeout can't change, why send it back with every heartbeat
> response?
> >>
> >> > possible cases with cluster restart, upgrade
> >> In those cases, a new connection will be established, and we'll retrieve
> >> the new timeout after the handshake.
> >>
> >>
> >> On Tue, Feb 15, 2022 at 12:04 PM Maksim Timonin <
> timoninmaxim@apache.org>
> >> wrote:
> >>
> >> > Hi Ivan,
> >> >
> >> > Cases you described sound reasonable to me. Then the client should
> just
> >> set
> >> > up the `keepAlive` flag, and it just works.
> >> >
> >> > So, there are 3 branches:
> >> > 1. Users don't configure keepAlive at all.
> >> > 2. Users configure keepAliveHeartbeatInterval (long, ms).
> >> > 3. Users configure keepAlive (boolean).
> >> >
> >> > AFAIU, Pavel's proposal is about covering the second case only. But
> >> > actually the 2nd and 3rd aren't conflicted with each other.I think for
> >> both
> >> > branches, a cluster should respond with idleTimeout value on every
> keep
> >> > alive client request. Because there are possible cases with cluster
> >> > restart, upgrade, etc. Clients should check every response and in case
> >> of
> >> > changed idleTimeout. For 2nd case write a WARN message, and for 3rd -
> >> > reconfigure themself in case of changed idleTimeout.
> >> >
> >> >
> >> >
> >> >
> >> > On Tue, Feb 15, 2022 at 9:51 AM Ivan Daschinsky <iv...@gmail.com>
> >> > wrote:
> >> >
> >> > > Regarding discussion here [1]
> >> > >
> >> > > I suppose that this feature, despite the fact that initial intention
> >> of
> >> > > Pavel was different, can drastically
> >> > > improve the usage pattern of thin clients and give a lot of
> >> opportunities
> >> > > if the following is done:
> >> > >
> >> > > 1. GridNioServer has a great feature -- idle timeout. If  a server
> did
> >> > not
> >> > > receive any from a client -- it will be kicked off.
> >> > >     But there are some scenarios that make the use of this feature
> >> > > impossible:
> >> > > a. Multiple workers waiting for batch tasks and relatively low
> >> requests
> >> > > rate -- this services will be often kicked off and must reconnect.
> >> > > In order to prevent this behaviour, the user must implement a kind
> of
> >> > > heartbeating by himself.
> >> > > b. Quite often user may want to implement leader-follower pattern
> for
> >> > > services for HA, so followers also will be considered as idle.
> Kicking
> >> > off
> >> > > these followers
> >> > > is not acceptable, so user  should also implement heartbeating by
> >> > himself.
> >> > >
> >> > > My proposition is:
> >> > > 1. Add two flags -- enable/disable heartbeats, and very optional
> >> > heartbeat
> >> > > timeout. Set enable to true by default, timeout to default heartbeat
> >> > > timeout.
> >> > > 2. If server and client both support this feature, and heartbeats
> are
> >> not
> >> > > explicitly disabled on client side:
> >> > > 3. Response to heartbeat request -- is idle timeout. If idle timeout
> >> is
> >> > set
> >> > > on the server side , set heartbeat timeout to one-third of it,
> instead
> >> > set
> >> > > to default or specified value.
> >> > >
> >> > > Pros:
> >> > > 1. Easy to set up -- just flag on client side and just set timeout
> on
> >> > > server side.
> >> > > 2. Hard to configure improperly, i.e set heartbeat timeout not short
> >> > enough
> >> > > in order to prevent kicking out by server.
> >> > > 3. If the user just wants heartbeats without setting idle timeout --
> >> > > heartbeats are by default on and with reasonable timeout.
> >> > >
> >> > > Cons:
> >> > > 1. If someone will rely on old behavior and just wants to drop his
> >> > clients
> >> > > on timeout -- this will not work without reconfiguring, he should
> >> disable
> >> > > heartbeats.
> >> > > But I cannot even imagine that someone will find this behaviour
> >> > desirable.
> >> > > I strongly believe that this behaviour prevents users from using
> >> > > idleTimeout on server side.
> >> > >
> >> > > [1] --
> >> https://github.com/apache/ignite/pull/9817#discussion_r805628955
> >> > >
> >> > > пт, 11 февр. 2022 г. в 10:58, Pavel Tupitsyn <ptupitsyn@apache.org
> >:
> >> > >
> >> > > > I've prepared a PR, please have a look:
> >> > > > https://github.com/apache/ignite/pull/9817
> >> > > >
> >> > > > On Mon, Feb 7, 2022 at 6:37 PM Ivan Daschinsky <
> ivandasch@gmail.com
> >> >
> >> > > > wrote:
> >> > > >
> >> > > > > I see potential in this feature, especially if we use something
> >> like
> >> > > > > continuous query. Stale clients can consume a lot of resources
> >> and it
> >> > > is
> >> > > > > worth kick these clients out.
> >> > > > >
> >> > > > > пн, 7 февр. 2022 г. в 18:25, Pavel Tupitsyn <
> ptupitsyn@apache.org
> >> >:
> >> > > > >
> >> > > > > > > If we use new approach, we can reduce this timeout. But this
> >> can
> >> > > > affect
> >> > > > > > old clients.
> >> > > > > >
> >> > > > > > idleTimeout is disabled by default, we are not going to change
> >> > this.
> >> > > > > >
> >> > > > > > > Also, let's think about that sending heartbeats and interval
> >> of
> >> > > > sending
> >> > > > > > > heartbeats could be calculated on the server side (i.e. one
> >> third
> >> > > of
> >> > > > > idle
> >> > > > > > > timeout) and sent to the client during handshake.
> >> > > > > > > Also we can introduce something like a negotiation mechanism
> >> as
> >> > in
> >> > > > > > > zookeeper.
> >> > > > > >
> >> > > > > > I tend to agree with Maksim here, let's keep it simple and
> >> > explicit.
> >> > > > > > Log a warning, but don't do anything clever.
> >> > > > > >
> >> > > > > > On Mon, Feb 7, 2022 at 6:15 PM Ivan Daschinsky <
> >> > ivandasch@gmail.com>
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > > > >> idleTimeout already exists, I don't think we should
> change
> >> the
> >> > > way
> >> > > > > it
> >> > > > > > > works (or did I misunderstand you?)
> >> > > > > > > If we use new approach, we can reduce this timeout. But this
> >> can
> >> > > > affect
> >> > > > > > old
> >> > > > > > > clients.
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > Also, let's think about that sending heartbeats and interval
> >> of
> >> > > > sending
> >> > > > > > > heartbeats could be calculated on the server side (i.e. one
> >> third
> >> > > of
> >> > > > > idle
> >> > > > > > > timeout) and sent to the client
> >> > > > > > > during handshake.
> >> > > > > > > Also we can introduce something like a negotiation mechanism
> >> as
> >> > in
> >> > > > > > > zookeeper.
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > пн, 7 февр. 2022 г. в 18:05, Pavel Tupitsyn <
> >> > ptupitsyn@apache.org
> >> > > >:
> >> > > > > > >
> >> > > > > > > > Igor,
> >> > > > > > > >
> >> > > > > > > > > Maybe clients should pass this information on to the
> >> > handshake.
> >> > > > > > > >
> >> > > > > > > > Do you think we should log a mismatched timeout warning on
> >> the
> >> > > > > server,
> >> > > > > > > not
> >> > > > > > > > on the client?
> >> > > > > > > > Or should we do both?
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > I've updated the proposal with OP_GET_IDLE_TIMEOUT and
> some
> >> > other
> >> > > > > > details
> >> > > > > > > > discussed above.
> >> > > > > > > >
> >> > > > > > > > On Mon, Feb 7, 2022 at 5:42 PM Igor Sapego <
> >> isapego@apache.org
> >> > >
> >> > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > > Feature seems useful for me as it makes connection
> >> management
> >> > > > more
> >> > > > > > > robust
> >> > > > > > > > > and
> >> > > > > > > > > predictable.
> >> > > > > > > > >
> >> > > > > > > > > I agree with Pavel, that we should print warning when
> >> > heartbeat
> >> > > > > > period
> >> > > > > > > is
> >> > > > > > > > > larger than
> >> > > > > > > > > idle timeout, but I see a problem here as idle timeout
> is
> >> > > > > configured
> >> > > > > > on
> >> > > > > > > > > server and is not
> >> > > > > > > > > known to clients, while heartbeats configured on clients
> >> and
> >> > > > their
> >> > > > > > > period
> >> > > > > > > > > is not known
> >> > > > > > > > > to the server. Maybe clients should pass this
> information
> >> on
> >> > to
> >> > > > the
> >> > > > > > > > > handshake.
> >> > > > > > > > >
> >> > > > > > > > > Regarding Python and PHP clients - can not we use some
> >> kind
> >> > of
> >> > > > > timers
> >> > > > > > > to
> >> > > > > > > > > implement
> >> > > > > > > > > this feature?
> >> > > > > > > > >
> >> > > > > > > > > Best Regards,
> >> > > > > > > > > Igor
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <
> >> > > > > ptupitsyn@apache.org>
> >> > > > > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > > Maksim, agree. Let's not be too clever and only log a
> >> > > warning.
> >> > > > > > > > > >
> >> > > > > > > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <
> >> > > > > > ptupitsyn@apache.org>
> >> > > > > > > > > > wrote:
> >> > > > > > > > > >
> >> > > > > > > > > > > Ivan, idleTimeout already exists, I don't think we
> >> should
> >> > > > > change
> >> > > > > > > the
> >> > > > > > > > > way
> >> > > > > > > > > > > it works (or did I misunderstand you?)
> >> > > > > > > > > > >
> >> > > > > > > > > > > Of course, enabling heartbeats means that otherwise
> >> idle
> >> > > > > clients
> >> > > > > > > will
> >> > > > > > > > > no
> >> > > > > > > > > > > longer be disconnected by the server.
> >> > > > > > > > > > > I think we should cross-link those properties in the
> >> > > > > > documentation
> >> > > > > > > > and
> >> > > > > > > > > > > explain this behavior.
> >> > > > > > > > > > >
> >> > > > > > > > > > > On Mon, Feb 7, 2022 at 4:39 PM Ivan Daschinsky <
> >> > > > > > > ivandasch@gmail.com>
> >> > > > > > > > > > > wrote:
> >> > > > > > > > > > >
> >> > > > > > > > > > >> >>3. Already implemented: when
> >> > > > > > > > > ClientConnectorConfiguration#idleTimeout
> >> > > > > > > > > > is
> >> > > > > > > > > > >> not zero, server disconnects idle clients
> >> > > > > > > > > > >> >>
> >> > > > > > > > > > >> But I suppose it would be great to have:
> >> > > > > > > > > > >> 1. If client supports keep alive, use idleTimeout
> >> > > > > > > > > > >> 2. If not, do not use it.
> >> > > > > > > > > > >>
> >> > > > > > > > > > >> But I am not sure if it is correct or not.
> >> > > > > > > > > > >>
> >> > > > > > > > > > >> пн, 7 февр. 2022 г. в 16:01, Maksim Timonin <
> >> > > > > > > > timoninmaxim@apache.org
> >> > > > > > > > > >:
> >> > > > > > > > > > >>
> >> > > > > > > > > > >> > I believe explicit is better than implicit :)
> Also
> >> in
> >> > > case
> >> > > > > of
> >> > > > > > > > > dynamic
> >> > > > > > > > > > >> > calculation of timeout, it can change
> dynamically,
> >> for
> >> > > > > example
> >> > > > > > > > > > >> restarting a
> >> > > > > > > > > > >> > cluster with different configuration should
> >> > reconfigure
> >> > > > > > clients
> >> > > > > > > > too.
> >> > > > > > > > > > >> Looks
> >> > > > > > > > > > >> > complicated.
> >> > > > > > > > > > >> >
> >> > > > > > > > > > >> > My vote for WARN + javadocs with mention of this
> >> > issue.
> >> > > > > > > > > > >> >
> >> > > > > > > > > > >> > On Mon, Feb 7, 2022 at 3:51 PM Pavel Tupitsyn <
> >> > > > > > > > ptupitsyn@apache.org
> >> > > > > > > > > >
> >> > > > > > > > > > >> > wrote:
> >> > > > > > > > > > >> >
> >> > > > > > > > > > >> > > > WDYT, should we add a WARN message for
> clients
> >> > that
> >> > > > > > > configure
> >> > > > > > > > > > >> > > > keepAliveTimeout greater than idleTimeout on
> >> the
> >> > > > server
> >> > > > > > > side?
> >> > > > > > > > > > >> > >
> >> > > > > > > > > > >> > > I think we should either log a WARN, or
> retrieve
> >> > > > > idleTimeout
> >> > > > > > > > from
> >> > > > > > > > > > >> server
> >> > > > > > > > > > >> > > and configure heartbeatTimeout accordingly
> (e.g.
> >> > > divide
> >> > > > by
> >> > > > > > 2).
> >> > > > > > > > > > >> > > Thoughts?
> >> > > > > > > > > > >> > >
> >> > > > > > > > > > >> > > On Mon, Feb 7, 2022 at 3:14 PM Maksim Timonin <
> >> > > > > > > > > > >> timoninmaxim@apache.org>
> >> > > > > > > > > > >> > > wrote:
> >> > > > > > > > > > >> > >
> >> > > > > > > > > > >> > > > Hi Pavel,
> >> > > > > > > > > > >> > > >
> >> > > > > > > > > > >> > > > Thanks for the links. Yes, I forgot that the
> >> flag
> >> > of
> >> > > > > > changed
> >> > > > > > > > > > >> topology
> >> > > > > > > > > > >> > is
> >> > > > > > > > > > >> > > > lazy. Also I missed that the keepAlive
> setting
> >> is
> >> > > > > > configured
> >> > > > > > > > on
> >> > > > > > > > > > the
> >> > > > > > > > > > >> > > client
> >> > > > > > > > > > >> > > > side (alternatively to idleTimeout that is on
> >> the
> >> > > > server
> >> > > > > > > > side).
> >> > > > > > > > > > >> > > >
> >> > > > > > > > > > >> > > > Now I understand, this feature can be helpful
> >> > then.
> >> > > > > Every
> >> > > > > > > > client
> >> > > > > > > > > > can
> >> > > > > > > > > > >> > > > configure itself in case it's possible to be
> >> idle
> >> > > > > > sometimes,
> >> > > > > > > > and
> >> > > > > > > > > > >> choose
> >> > > > > > > > > > >> > > > an appropriate timeout by itself too. And by
> >> > default
> >> > > > the
> >> > > > > > > > feature
> >> > > > > > > > > > >> should
> >> > > > > > > > > > >> > > be
> >> > > > > > > > > > >> > > > disabled.
> >> > > > > > > > > > >> > > >
> >> > > > > > > > > > >> > > > WDYT, should we add a WARN message for
> clients
> >> > that
> >> > > > > > > configure
> >> > > > > > > > > > >> > > > keepAliveTimeout greater than idleTimeout on
> >> the
> >> > > > server
> >> > > > > > > side?
> >> > > > > > > > > > >> > > >
> >> > > > > > > > > > >> > > >
> >> > > > > > > > > > >> > > >
> >> > > > > > > > > > >> > > > On Mon, Feb 7, 2022 at 1:05 PM Pavel
> Tupitsyn <
> >> > > > > > > > > > ptupitsyn@apache.org
> >> > > > > > > > > > >> >
> >> > > > > > > > > > >> > > > wrote:
> >> > > > > > > > > > >> > > >
> >> > > > > > > > > > >> > > > > Ivan,
> >> > > > > > > > > > >> > > > >
> >> > > > > > > > > > >> > > > > I suggest the following:
> >> > > > > > > > > > >> > > > >
> >> > > > > > > > > > >> > > > > 1. Server sends KEEP_ALIVE feature flag,
> >> which
> >> > > means
> >> > > > > it
> >> > > > > > > > > accepts
> >> > > > > > > > > > >> > > > > OP_KEEP_ALIVE empty message
> >> > > > > > > > > > >> > > > > 2. Client sends OP_KEEP_ALIVE when the
> >> > connection
> >> > > is
> >> > > > > > idle
> >> > > > > > > > for
> >> > > > > > > > > a
> >> > > > > > > > > > >> > > > > certain period of time
> >> > > > > > > > > > >> > > > > 3. Already implemented: when
> >> > > > > > > > > > >> ClientConnectorConfiguration#idleTimeout
> >> > > > > > > > > > >> > > is
> >> > > > > > > > > > >> > > > > not zero, server disconnects idle clients
> >> > > > > > > > > > >> > > > >
> >> > > > > > > > > > >> > > > > This way we don't need server->client
> >> > keepalives,
> >> > > as
> >> > > > > you
> >> > > > > > > > > > correctly
> >> > > > > > > > > > >> > > noted.
> >> > > > > > > > > > >> > > > >
> >> > > > > > > > > > >> > > > > On Mon, Feb 7, 2022 at 12:43 PM Ivan
> >> Daschinsky
> >> > <
> >> > > > > > > > > > >> ivandasch@gmail.com
> >> > > > > > > > > > >> > >
> >> > > > > > > > > > >> > > > > wrote:
> >> > > > > > > > > > >> > > > >
> >> > > > > > > > > > >> > > > > > Pavel, I suppose that ideally:
> >> > > > > > > > > > >> > > > > > 1. Client send in handshake flag, that it
> >> > > supports
> >> > > > > > > > > KEEP_ALIVE
> >> > > > > > > > > > >> > feature
> >> > > > > > > > > > >> > > > and
> >> > > > > > > > > > >> > > > > > server takes it into account.
> >> > > > > > > > > > >> > > > > > 2. Each request of client can be
> >> considered as
> >> > > > > > > keep-alive
> >> > > > > > > > > > ping.
> >> > > > > > > > > > >> > > > > > 3. Client send failure should be
> processed
> >> > using
> >> > > > > retry
> >> > > > > > > > > policy.
> >> > > > > > > > > > >> > > > > > 4. Server should not send keep-alive
> >> packets,
> >> > it
> >> > > > is
> >> > > > > > > > > redundant,
> >> > > > > > > > > > >> but
> >> > > > > > > > > > >> > > > server
> >> > > > > > > > > > >> > > > > > should track requests from client and if
> >> there
> >> > > is
> >> > > > no
> >> > > > > > > > > requests
> >> > > > > > > > > > >> from
> >> > > > > > > > > > >> > > > client
> >> > > > > > > > > > >> > > > > > with KEEP_ALIVE feature,
> >> > > > > > > > > > >> > > > > > automatically close connection and free
> >> > > resources.
> >> > > > > > > > > > >> > > > > >
> >> > > > > > > > > > >> > > > > > Similar approach is used in zookeeper
> >> clients.
> >> > > > > > > > > > >> > > > > >
> >> > > > > > > > > > >> > > > > > пн, 7 февр. 2022 г. в 12:24, Pavel
> >> Tupitsyn <
> >> > > > > > > > > > >> ptupitsyn@apache.org
> >> > > > > > > > > > >> > >:
> >> > > > > > > > > > >> > > > > >
> >> > > > > > > > > > >> > > > > > > Ivan,
> >> > > > > > > > > > >> > > > > > >
> >> > > > > > > > > > >> > > > > > > Ideally, the check should come from
> both
> >> > > sides.
> >> > > > > > > > > > >> > > > > > > - Client periodically sends keepalive
> to
> >> > > server
> >> > > > > > > > > > >> > > > > > > - Server periodically sends keepalive
> to
> >> > > client
> >> > > > > > > > > > >> > > > > > >
> >> > > > > > > > > > >> > > > > > > Feature flags will be added
> accordingly,
> >> so
> >> > it
> >> > > > is
> >> > > > > > not
> >> > > > > > > > > > >> necessary
> >> > > > > > > > > > >> > to
> >> > > > > > > > > > >> > > > > > > implement this in all thin clients.
> >> > > > > > > > > > >> > > > > > >
> >> > > > > > > > > > >> > > > > > > On Mon, Feb 7, 2022 at 11:43 AM Ivan
> >> > > Daschinsky
> >> > > > <
> >> > > > > > > > > > >> > > ivandasch@gmail.com
> >> > > > > > > > > > >> > > > >
> >> > > > > > > > > > >> > > > > > > wrote:
> >> > > > > > > > > > >> > > > > > >
> >> > > > > > > > > > >> > > > > > > > I suppose it is great idea, but this
> >> > > > > functionality
> >> > > > > > > can
> >> > > > > > > > > be
> >> > > > > > > > > > >> hard
> >> > > > > > > > > > >> > to
> >> > > > > > > > > > >> > > > > > > implement
> >> > > > > > > > > > >> > > > > > > > for some platforms. I.e. sync python
> >> > client
> >> > > or
> >> > > > > php
> >> > > > > > > > > (there
> >> > > > > > > > > > >> is no
> >> > > > > > > > > > >> > > > real
> >> > > > > > > > > > >> > > > > > > > multithreading for python (GIL) and
> >> php is
> >> > > > > single
> >> > > > > > > > > threaded
> >> > > > > > > > > > >> by
> >> > > > > > > > > > >> > > > > design).
> >> > > > > > > > > > >> > > > > > > But
> >> > > > > > > > > > >> > > > > > > > for async clients it is not very hard
> >> to
> >> > > > > > implement.
> >> > > > > > > > > > >> > Nevertheless,
> >> > > > > > > > > > >> > > > > this
> >> > > > > > > > > > >> > > > > > > > feature should be optional, because
> of
> >> > > > possible
> >> > > > > > > > > technical
> >> > > > > > > > > > >> > > > > limitations.
> >> > > > > > > > > > >> > > > > > > >
> >> > > > > > > > > > >> > > > > > > > Pavel, is this check mostly for
> client
> >> > side?
> >> > > > Or
> >> > > > > > > > servers
> >> > > > > > > > > > can
> >> > > > > > > > > > >> do
> >> > > > > > > > > > >> > > some
> >> > > > > > > > > > >> > > > > > > actions
> >> > > > > > > > > > >> > > > > > > > if there is no activity from thin
> >> client
> >> > > (i.e.
> >> > > > > > > closing
> >> > > > > > > > > > >> context
> >> > > > > > > > > > >> > > and
> >> > > > > > > > > > >> > > > > free
> >> > > > > > > > > > >> > > > > > > > resources such as queries' handles
> and
> >> so
> >> > > on?)
> >> > > > > > > > > > >> > > > > > > >
> >> > > > > > > > > > >> > > > > > > > пн, 7 февр. 2022 г. в 11:09, Pavel
> >> > Tupitsyn
> >> > > <
> >> > > > > > > > > > >> > > ptupitsyn@apache.org
> >> > > > > > > > > > >> > > > >:
> >> > > > > > > > > > >> > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > Hi Maksim,
> >> > > > > > > > > > >> > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > > half-state is a possible
> situation
> >> > when
> >> > > an
> >> > > > > > > Ignite
> >> > > > > > > > > node
> >> > > > > > > > > > >> goes
> >> > > > > > > > > > >> > > > down
> >> > > > > > > > > > >> > > > > or
> >> > > > > > > > > > >> > > > > > > > > somehow removes connection to a
> thin
> >> > > client
> >> > > > > > > > > > >> > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > Half-open state is also possible
> >> when,
> >> > for
> >> > > > > > > example,
> >> > > > > > > > an
> >> > > > > > > > > > >> > > > intermediate
> >> > > > > > > > > > >> > > > > > > > router
> >> > > > > > > > > > >> > > > > > > > > is rebooted [1].
> >> > > > > > > > > > >> > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > This is what we seem to have
> >> encountered
> >> > > > with
> >> > > > > > one
> >> > > > > > > of
> >> > > > > > > > > our
> >> > > > > > > > > > >> > > > customers
> >> > > > > > > > > > >> > > > > -
> >> > > > > > > > > > >> > > > > > > they
> >> > > > > > > > > > >> > > > > > > > > have a stable cluster, and
> >> long-living
> >> > > > > (multiple
> >> > > > > > > > days)
> >> > > > > > > > > > >> thin
> >> > > > > > > > > > >> > > > client
> >> > > > > > > > > > >> > > > > > > > > connections which can be idle for
> >> some
> >> > > time.
> >> > > > > > > > > > >> > > > > > > > > And only when we send some data on
> >> such
> >> > an
> >> > > > > idle
> >> > > > > > > > > > >> connection do
> >> > > > > > > > > > >> > > we
> >> > > > > > > > > > >> > > > > > > discover
> >> > > > > > > > > > >> > > > > > > > > that it is broken.
> >> > > > > > > > > > >> > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > > But with enabled (true by
> default)
> >> > > > > > > > > partitionAwareness
> >> > > > > > > > > > >> > feature
> >> > > > > > > > > > >> > > > > > clients
> >> > > > > > > > > > >> > > > > > > > can
> >> > > > > > > > > > >> > > > > > > > > be notified about topology changes
> >> > > > > > > > > > >> > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > Partition awareness is a "lazy"
> >> > > notification
> >> > > > > in
> >> > > > > > a
> >> > > > > > > > form
> >> > > > > > > > > > of
> >> > > > > > > > > > >> a
> >> > > > > > > > > > >> > > > > response
> >> > > > > > > > > > >> > > > > > > > > message flag [2].
> >> > > > > > > > > > >> > > > > > > > > You won't get one on an idle
> >> connection.
> >> > > > > > > > > > >> > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > > the connections are removed on
> the
> >> > > server
> >> > > > > side
> >> > > > > > > by
> >> > > > > > > > > > client
> >> > > > > > > > > > >> > idle
> >> > > > > > > > > > >> > > > > > timeout
> >> > > > > > > > > > >> > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > Idle timeout is disabled by
> default.
> >> > > > > > > > > > >> > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > > is it OK to keep such connections
> >> > alive
> >> > > > for
> >> > > > > a
> >> > > > > > > long
> >> > > > > > > > > > time
> >> > > > > > > > > > >> > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > I think it is up to the user.
> >> > > > > > > > > > >> > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > > in the case of partition
> awareness
> >> > > > features
> >> > > > > it
> >> > > > > > > can
> >> > > > > > > > > > lead
> >> > > > > > > > > > >> to
> >> > > > > > > > > > >> > > > > wasting
> >> > > > > > > > > > >> > > > > > > TCP
> >> > > > > > > > > > >> > > > > > > > > sockets on Ignite nodes, can't it
> >> > > > > > > > > > >> > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > Can you please elaborate?
> >> > > > > > > > > > >> > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > [1]
> >> > > > > > > > > > >> > > > > > > > >
> >> > > > > > > > > > >> > > > > > > >
> >> > > > > > > > > > >> > > > > > >
> >> > > > > > > > > > >> > > > > >
> >> > > > > > > > > > >> > > > >
> >> > > > > > > > > > >> > > >
> >> > > > > > > > > > >> > >
> >> > > > > > > > > > >> >
> >> > > > > > > > > > >>
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
> >> > > > > > > > > > >> > > > > > > > > [2]
> >> > > > > > > > > > >> > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > >
> >> > > > > > > > > > >> > > > > > > >
> >> > > > > > > > > > >> > > > > > >
> >> > > > > > > > > > >> > > > > >
> >> > > > > > > > > > >> > > > >
> >> > > > > > > > > > >> > > >
> >> > > > > > > > > > >> > >
> >> > > > > > > > > > >> >
> >> > > > > > > > > > >>
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
> >> > > > > > > > > > >> > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > On Fri, Feb 4, 2022 at 4:01 PM
> Maksim
> >> > > > Timonin
> >> > > > > <
> >> > > > > > > > > > >> > > > > > timoninmaxim@apache.org
> >> > > > > > > > > > >> > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > wrote:
> >> > > > > > > > > > >> > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > > Hi Pavel,
> >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > > Thanks for starting this thread!
> >> Can I
> >> > > ask
> >> > > > > > some
> >> > > > > > > > > > >> questions
> >> > > > > > > > > > >> > > here
> >> > > > > > > > > > >> > > > to
> >> > > > > > > > > > >> > > > > > get
> >> > > > > > > > > > >> > > > > > > > the
> >> > > > > > > > > > >> > > > > > > > > > feature more clearly?
> >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > > As I understand it correctly,
> >> > half-state
> >> > > > is
> >> > > > > a
> >> > > > > > > > > possible
> >> > > > > > > > > > >> > > > situation
> >> > > > > > > > > > >> > > > > > when
> >> > > > > > > > > > >> > > > > > > > an
> >> > > > > > > > > > >> > > > > > > > > > Ignite node goes down or somehow
> >> > removes
> >> > > > > > > > connection
> >> > > > > > > > > > to a
> >> > > > > > > > > > >> > thin
> >> > > > > > > > > > >> > > > > > client.
> >> > > > > > > > > > >> > > > > > > > But
> >> > > > > > > > > > >> > > > > > > > > > with enabled (true by default)
> >> > > > > > > partitionAwareness
> >> > > > > > > > > > >> feature
> >> > > > > > > > > > >> > > > clients
> >> > > > > > > > > > >> > > > > > can
> >> > > > > > > > > > >> > > > > > > > be
> >> > > > > > > > > > >> > > > > > > > > > notified about topology changes.
> >> So,
> >> > > there
> >> > > > > are
> >> > > > > > > > > > possible
> >> > > > > > > > > > >> > > cases:
> >> > > > > > > > > > >> > > > > > > > > > 1. ThinClient connects to a
> single
> >> > node.
> >> > > > > > > > > > >> > > > > > > > > > 2. Ignite node removes connection
> >> from
> >> > > > > itself.
> >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > > I like the idea for the case
> with a
> >> > > single
> >> > > > > > node,
> >> > > > > > > > as
> >> > > > > > > > > it
> >> > > > > > > > > > >> > helps
> >> > > > > > > > > > >> > > > fail
> >> > > > > > > > > > >> > > > > > > fast.
> >> > > > > > > > > > >> > > > > > > > > > But is it OK to connect a client
> >> to a
> >> > > > single
> >> > > > > > > node
> >> > > > > > > > > > only?
> >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > > For the second one: you mention
> >> that a
> >> > > > case
> >> > > > > > for
> >> > > > > > > > the
> >> > > > > > > > > > >> second
> >> > > > > > > > > > >> > > > option
> >> > > > > > > > > > >> > > > > > is
> >> > > > > > > > > > >> > > > > > > > > > "Long-living and mostly idle
> >> > connections
> >> > > > are
> >> > > > > > > > > > especially
> >> > > > > > > > > > >> > > > > susceptible
> >> > > > > > > > > > >> > > > > > > to
> >> > > > > > > > > > >> > > > > > > > > this
> >> > > > > > > > > > >> > > > > > > > > > behavior". If I understand
> >> correctly
> >> > the
> >> > > > > > > > connections
> >> > > > > > > > > > are
> >> > > > > > > > > > >> > > > removed
> >> > > > > > > > > > >> > > > > on
> >> > > > > > > > > > >> > > > > > > the
> >> > > > > > > > > > >> > > > > > > > > > server side by client idle
> timeout.
> >> > Can
> >> > > we
> >> > > > > > just
> >> > > > > > > > > > >> configure
> >> > > > > > > > > > >> > the
> >> > > > > > > > > > >> > > > > idle
> >> > > > > > > > > > >> > > > > > > > > timeout
> >> > > > > > > > > > >> > > > > > > > > > for cases where we really need
> >> keeping
> >> > > > alive
> >> > > > > > > idle
> >> > > > > > > > > > >> > > connections?
> >> > > > > > > > > > >> > > > > Are
> >> > > > > > > > > > >> > > > > > > > there
> >> > > > > > > > > > >> > > > > > > > > > any other cases with unexpectedly
> >> > > dropped
> >> > > > > > > > > connections?
> >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > > I'm wondering is it OK to keep
> such
> >> > > > > > connections
> >> > > > > > > > > alive
> >> > > > > > > > > > >> for a
> >> > > > > > > > > > >> > > > long
> >> > > > > > > > > > >> > > > > > > time?
> >> > > > > > > > > > >> > > > > > > > > > Also in the case of partition
> >> > awareness
> >> > > > > > features
> >> > > > > > > > it
> >> > > > > > > > > > can
> >> > > > > > > > > > >> > lead
> >> > > > > > > > > > >> > > to
> >> > > > > > > > > > >> > > > > > > wasting
> >> > > > > > > > > > >> > > > > > > > > TCP
> >> > > > > > > > > > >> > > > > > > > > > sockets on Ignite nodes, can't
> it?
> >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > > Thanks!
> >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > > On Thu, Feb 3, 2022 at 2:24 PM
> >> Pavel
> >> > > > > Tupitsyn
> >> > > > > > <
> >> > > > > > > > > > >> > > > > > ptupitsyn@apache.org>
> >> > > > > > > > > > >> > > > > > > > > > wrote:
> >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > > >> Igniters,
> >> > > > > > > > > > >> > > > > > > > > >>
> >> > > > > > > > > > >> > > > > > > > > >> Please review the proposal to
> add
> >> > > > heartbeat
> >> > > > > > > > > messages
> >> > > > > > > > > > to
> >> > > > > > > > > > >> > the
> >> > > > > > > > > > >> > > > thin
> >> > > > > > > > > > >> > > > > > > > client
> >> > > > > > > > > > >> > > > > > > > > >> protocol (both 2.x and 3.x) and
> >> let
> >> > me
> >> > > > know
> >> > > > > > > your
> >> > > > > > > > > > >> thoughts:
> >> > > > > > > > > > >> > > > > > > > > >>
> >> > > > > > > > > > >> > > > > > > > > >>
> >> > > > > > > > > > >> > > > > > > > > >>
> >> > > > > > > > > > >> > > > > > > > >
> >> > > > > > > > > > >> > > > > > > >
> >> > > > > > > > > > >> > > > > > >
> >> > > > > > > > > > >> > > > > >
> >> > > > > > > > > > >> > > > >
> >> > > > > > > > > > >> > > >
> >> > > > > > > > > > >> > >
> >> > > > > > > > > > >> >
> >> > > > > > > > > > >>
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
> >> > > > > > > > > > >> > > > > > > > > >>
> >> > > > > > > > > > >> > > > > > > > > >
> >> > > > > > > > > > >> > > > > > > > >
> >> > > > > > > > > > >> > > > > > > >
> >> > > > > > > > > > >> > > > > > > >
> >> > > > > > > > > > >> > > > > > > > --
> >> > > > > > > > > > >> > > > > > > > Sincerely yours, Ivan Daschinskiy
> >> > > > > > > > > > >> > > > > > > >
> >> > > > > > > > > > >> > > > > > >
> >> > > > > > > > > > >> > > > > >
> >> > > > > > > > > > >> > > > > >
> >> > > > > > > > > > >> > > > > > --
> >> > > > > > > > > > >> > > > > > Sincerely yours, Ivan Daschinskiy
> >> > > > > > > > > > >> > > > > >
> >> > > > > > > > > > >> > > > >
> >> > > > > > > > > > >> > > >
> >> > > > > > > > > > >> > >
> >> > > > > > > > > > >> >
> >> > > > > > > > > > >>
> >> > > > > > > > > > >>
> >> > > > > > > > > > >> --
> >> > > > > > > > > > >> Sincerely yours, Ivan Daschinskiy
> >> > > > > > > > > > >>
> >> > > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > --
> >> > > > > > > Sincerely yours, Ivan Daschinskiy
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > > >
> >> > > > > --
> >> > > > > Sincerely yours, Ivan Daschinskiy
> >> > > > >
> >> > > >
> >> > >
> >> > >
> >> > > --
> >> > > Sincerely yours, Ivan Daschinskiy
> >> > >
> >> >
> >>
> >
> >
> > --
> > Sincerely yours, Ivan Daschinskiy
> >
>

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Ivan Daschinsky <iv...@gmail.com>.
Let's think about unify two operations

Pros.
1. Just one operation.
2. Possibility to change idle timeout in runtime on cluster (using
distributed property)

Cons.
1. Extra 8 bytes (as for me, it is negligible)

As for me, less op_codes and format messages is better.

вт, 15 февр. 2022 г. в 16:32, Ivan Daschinsky <iv...@gmail.com>:

> Pavel, sorry, i've made mistake. But current behaviour is ok for me. This
> timeout cannot be change on server side runtime. But we can simplify
> protocol just use one opcode and message
>
> вт, 15 февр. 2022 г., 14:54 Ivan Daschinsky <iv...@gmail.com>:
>
>> > Idle timeout can't change, why send it back with every heartbeat
>> response?
>> May be I am wrong, but from code I see this behaviour. But if I am wrong,
>> this is ok behaviour for me.
>>
>>
>>
>> вт, 15 февр. 2022 г. в 14:00, Pavel Tupitsyn <pt...@apache.org>:
>>
>>> Ivan, I mostly agree with your proposal, except this point:
>>>
>>> > Response to heartbeat request -- is idle timeout
>>> Idle timeout can't change, why send it back with every heartbeat
>>> response?
>>>
>>> > possible cases with cluster restart, upgrade
>>> In those cases, a new connection will be established, and we'll retrieve
>>> the new timeout after the handshake.
>>>
>>>
>>> On Tue, Feb 15, 2022 at 12:04 PM Maksim Timonin <timoninmaxim@apache.org
>>> >
>>> wrote:
>>>
>>> > Hi Ivan,
>>> >
>>> > Cases you described sound reasonable to me. Then the client should
>>> just set
>>> > up the `keepAlive` flag, and it just works.
>>> >
>>> > So, there are 3 branches:
>>> > 1. Users don't configure keepAlive at all.
>>> > 2. Users configure keepAliveHeartbeatInterval (long, ms).
>>> > 3. Users configure keepAlive (boolean).
>>> >
>>> > AFAIU, Pavel's proposal is about covering the second case only. But
>>> > actually the 2nd and 3rd aren't conflicted with each other.I think for
>>> both
>>> > branches, a cluster should respond with idleTimeout value on every keep
>>> > alive client request. Because there are possible cases with cluster
>>> > restart, upgrade, etc. Clients should check every response and in case
>>> of
>>> > changed idleTimeout. For 2nd case write a WARN message, and for 3rd -
>>> > reconfigure themself in case of changed idleTimeout.
>>> >
>>> >
>>> >
>>> >
>>> > On Tue, Feb 15, 2022 at 9:51 AM Ivan Daschinsky <iv...@gmail.com>
>>> > wrote:
>>> >
>>> > > Regarding discussion here [1]
>>> > >
>>> > > I suppose that this feature, despite the fact that initial intention
>>> of
>>> > > Pavel was different, can drastically
>>> > > improve the usage pattern of thin clients and give a lot of
>>> opportunities
>>> > > if the following is done:
>>> > >
>>> > > 1. GridNioServer has a great feature -- idle timeout. If  a server
>>> did
>>> > not
>>> > > receive any from a client -- it will be kicked off.
>>> > >     But there are some scenarios that make the use of this feature
>>> > > impossible:
>>> > > a. Multiple workers waiting for batch tasks and relatively low
>>> requests
>>> > > rate -- this services will be often kicked off and must reconnect.
>>> > > In order to prevent this behaviour, the user must implement a kind of
>>> > > heartbeating by himself.
>>> > > b. Quite often user may want to implement leader-follower pattern for
>>> > > services for HA, so followers also will be considered as idle.
>>> Kicking
>>> > off
>>> > > these followers
>>> > > is not acceptable, so user  should also implement heartbeating by
>>> > himself.
>>> > >
>>> > > My proposition is:
>>> > > 1. Add two flags -- enable/disable heartbeats, and very optional
>>> > heartbeat
>>> > > timeout. Set enable to true by default, timeout to default heartbeat
>>> > > timeout.
>>> > > 2. If server and client both support this feature, and heartbeats
>>> are not
>>> > > explicitly disabled on client side:
>>> > > 3. Response to heartbeat request -- is idle timeout. If idle timeout
>>> is
>>> > set
>>> > > on the server side , set heartbeat timeout to one-third of it,
>>> instead
>>> > set
>>> > > to default or specified value.
>>> > >
>>> > > Pros:
>>> > > 1. Easy to set up -- just flag on client side and just set timeout on
>>> > > server side.
>>> > > 2. Hard to configure improperly, i.e set heartbeat timeout not short
>>> > enough
>>> > > in order to prevent kicking out by server.
>>> > > 3. If the user just wants heartbeats without setting idle timeout --
>>> > > heartbeats are by default on and with reasonable timeout.
>>> > >
>>> > > Cons:
>>> > > 1. If someone will rely on old behavior and just wants to drop his
>>> > clients
>>> > > on timeout -- this will not work without reconfiguring, he should
>>> disable
>>> > > heartbeats.
>>> > > But I cannot even imagine that someone will find this behaviour
>>> > desirable.
>>> > > I strongly believe that this behaviour prevents users from using
>>> > > idleTimeout on server side.
>>> > >
>>> > > [1] --
>>> https://github.com/apache/ignite/pull/9817#discussion_r805628955
>>> > >
>>> > > пт, 11 февр. 2022 г. в 10:58, Pavel Tupitsyn <pt...@apache.org>:
>>> > >
>>> > > > I've prepared a PR, please have a look:
>>> > > > https://github.com/apache/ignite/pull/9817
>>> > > >
>>> > > > On Mon, Feb 7, 2022 at 6:37 PM Ivan Daschinsky <
>>> ivandasch@gmail.com>
>>> > > > wrote:
>>> > > >
>>> > > > > I see potential in this feature, especially if we use something
>>> like
>>> > > > > continuous query. Stale clients can consume a lot of resources
>>> and it
>>> > > is
>>> > > > > worth kick these clients out.
>>> > > > >
>>> > > > > пн, 7 февр. 2022 г. в 18:25, Pavel Tupitsyn <
>>> ptupitsyn@apache.org>:
>>> > > > >
>>> > > > > > > If we use new approach, we can reduce this timeout. But this
>>> can
>>> > > > affect
>>> > > > > > old clients.
>>> > > > > >
>>> > > > > > idleTimeout is disabled by default, we are not going to change
>>> > this.
>>> > > > > >
>>> > > > > > > Also, let's think about that sending heartbeats and interval
>>> of
>>> > > > sending
>>> > > > > > > heartbeats could be calculated on the server side (i.e. one
>>> third
>>> > > of
>>> > > > > idle
>>> > > > > > > timeout) and sent to the client during handshake.
>>> > > > > > > Also we can introduce something like a negotiation mechanism
>>> as
>>> > in
>>> > > > > > > zookeeper.
>>> > > > > >
>>> > > > > > I tend to agree with Maksim here, let's keep it simple and
>>> > explicit.
>>> > > > > > Log a warning, but don't do anything clever.
>>> > > > > >
>>> > > > > > On Mon, Feb 7, 2022 at 6:15 PM Ivan Daschinsky <
>>> > ivandasch@gmail.com>
>>> > > > > > wrote:
>>> > > > > >
>>> > > > > > > >> idleTimeout already exists, I don't think we should
>>> change the
>>> > > way
>>> > > > > it
>>> > > > > > > works (or did I misunderstand you?)
>>> > > > > > > If we use new approach, we can reduce this timeout. But this
>>> can
>>> > > > affect
>>> > > > > > old
>>> > > > > > > clients.
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > Also, let's think about that sending heartbeats and interval
>>> of
>>> > > > sending
>>> > > > > > > heartbeats could be calculated on the server side (i.e. one
>>> third
>>> > > of
>>> > > > > idle
>>> > > > > > > timeout) and sent to the client
>>> > > > > > > during handshake.
>>> > > > > > > Also we can introduce something like a negotiation mechanism
>>> as
>>> > in
>>> > > > > > > zookeeper.
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > пн, 7 февр. 2022 г. в 18:05, Pavel Tupitsyn <
>>> > ptupitsyn@apache.org
>>> > > >:
>>> > > > > > >
>>> > > > > > > > Igor,
>>> > > > > > > >
>>> > > > > > > > > Maybe clients should pass this information on to the
>>> > handshake.
>>> > > > > > > >
>>> > > > > > > > Do you think we should log a mismatched timeout warning on
>>> the
>>> > > > > server,
>>> > > > > > > not
>>> > > > > > > > on the client?
>>> > > > > > > > Or should we do both?
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > I've updated the proposal with OP_GET_IDLE_TIMEOUT and some
>>> > other
>>> > > > > > details
>>> > > > > > > > discussed above.
>>> > > > > > > >
>>> > > > > > > > On Mon, Feb 7, 2022 at 5:42 PM Igor Sapego <
>>> isapego@apache.org
>>> > >
>>> > > > > wrote:
>>> > > > > > > >
>>> > > > > > > > > Feature seems useful for me as it makes connection
>>> management
>>> > > > more
>>> > > > > > > robust
>>> > > > > > > > > and
>>> > > > > > > > > predictable.
>>> > > > > > > > >
>>> > > > > > > > > I agree with Pavel, that we should print warning when
>>> > heartbeat
>>> > > > > > period
>>> > > > > > > is
>>> > > > > > > > > larger than
>>> > > > > > > > > idle timeout, but I see a problem here as idle timeout is
>>> > > > > configured
>>> > > > > > on
>>> > > > > > > > > server and is not
>>> > > > > > > > > known to clients, while heartbeats configured on clients
>>> and
>>> > > > their
>>> > > > > > > period
>>> > > > > > > > > is not known
>>> > > > > > > > > to the server. Maybe clients should pass this
>>> information on
>>> > to
>>> > > > the
>>> > > > > > > > > handshake.
>>> > > > > > > > >
>>> > > > > > > > > Regarding Python and PHP clients - can not we use some
>>> kind
>>> > of
>>> > > > > timers
>>> > > > > > > to
>>> > > > > > > > > implement
>>> > > > > > > > > this feature?
>>> > > > > > > > >
>>> > > > > > > > > Best Regards,
>>> > > > > > > > > Igor
>>> > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <
>>> > > > > ptupitsyn@apache.org>
>>> > > > > > > > > wrote:
>>> > > > > > > > >
>>> > > > > > > > > > Maksim, agree. Let's not be too clever and only log a
>>> > > warning.
>>> > > > > > > > > >
>>> > > > > > > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <
>>> > > > > > ptupitsyn@apache.org>
>>> > > > > > > > > > wrote:
>>> > > > > > > > > >
>>> > > > > > > > > > > Ivan, idleTimeout already exists, I don't think we
>>> should
>>> > > > > change
>>> > > > > > > the
>>> > > > > > > > > way
>>> > > > > > > > > > > it works (or did I misunderstand you?)
>>> > > > > > > > > > >
>>> > > > > > > > > > > Of course, enabling heartbeats means that otherwise
>>> idle
>>> > > > > clients
>>> > > > > > > will
>>> > > > > > > > > no
>>> > > > > > > > > > > longer be disconnected by the server.
>>> > > > > > > > > > > I think we should cross-link those properties in the
>>> > > > > > documentation
>>> > > > > > > > and
>>> > > > > > > > > > > explain this behavior.
>>> > > > > > > > > > >
>>> > > > > > > > > > > On Mon, Feb 7, 2022 at 4:39 PM Ivan Daschinsky <
>>> > > > > > > ivandasch@gmail.com>
>>> > > > > > > > > > > wrote:
>>> > > > > > > > > > >
>>> > > > > > > > > > >> >>3. Already implemented: when
>>> > > > > > > > > ClientConnectorConfiguration#idleTimeout
>>> > > > > > > > > > is
>>> > > > > > > > > > >> not zero, server disconnects idle clients
>>> > > > > > > > > > >> >>
>>> > > > > > > > > > >> But I suppose it would be great to have:
>>> > > > > > > > > > >> 1. If client supports keep alive, use idleTimeout
>>> > > > > > > > > > >> 2. If not, do not use it.
>>> > > > > > > > > > >>
>>> > > > > > > > > > >> But I am not sure if it is correct or not.
>>> > > > > > > > > > >>
>>> > > > > > > > > > >> пн, 7 февр. 2022 г. в 16:01, Maksim Timonin <
>>> > > > > > > > timoninmaxim@apache.org
>>> > > > > > > > > >:
>>> > > > > > > > > > >>
>>> > > > > > > > > > >> > I believe explicit is better than implicit :)
>>> Also in
>>> > > case
>>> > > > > of
>>> > > > > > > > > dynamic
>>> > > > > > > > > > >> > calculation of timeout, it can change
>>> dynamically, for
>>> > > > > example
>>> > > > > > > > > > >> restarting a
>>> > > > > > > > > > >> > cluster with different configuration should
>>> > reconfigure
>>> > > > > > clients
>>> > > > > > > > too.
>>> > > > > > > > > > >> Looks
>>> > > > > > > > > > >> > complicated.
>>> > > > > > > > > > >> >
>>> > > > > > > > > > >> > My vote for WARN + javadocs with mention of this
>>> > issue.
>>> > > > > > > > > > >> >
>>> > > > > > > > > > >> > On Mon, Feb 7, 2022 at 3:51 PM Pavel Tupitsyn <
>>> > > > > > > > ptupitsyn@apache.org
>>> > > > > > > > > >
>>> > > > > > > > > > >> > wrote:
>>> > > > > > > > > > >> >
>>> > > > > > > > > > >> > > > WDYT, should we add a WARN message for clients
>>> > that
>>> > > > > > > configure
>>> > > > > > > > > > >> > > > keepAliveTimeout greater than idleTimeout on
>>> the
>>> > > > server
>>> > > > > > > side?
>>> > > > > > > > > > >> > >
>>> > > > > > > > > > >> > > I think we should either log a WARN, or retrieve
>>> > > > > idleTimeout
>>> > > > > > > > from
>>> > > > > > > > > > >> server
>>> > > > > > > > > > >> > > and configure heartbeatTimeout accordingly (e.g.
>>> > > divide
>>> > > > by
>>> > > > > > 2).
>>> > > > > > > > > > >> > > Thoughts?
>>> > > > > > > > > > >> > >
>>> > > > > > > > > > >> > > On Mon, Feb 7, 2022 at 3:14 PM Maksim Timonin <
>>> > > > > > > > > > >> timoninmaxim@apache.org>
>>> > > > > > > > > > >> > > wrote:
>>> > > > > > > > > > >> > >
>>> > > > > > > > > > >> > > > Hi Pavel,
>>> > > > > > > > > > >> > > >
>>> > > > > > > > > > >> > > > Thanks for the links. Yes, I forgot that the
>>> flag
>>> > of
>>> > > > > > changed
>>> > > > > > > > > > >> topology
>>> > > > > > > > > > >> > is
>>> > > > > > > > > > >> > > > lazy. Also I missed that the keepAlive
>>> setting is
>>> > > > > > configured
>>> > > > > > > > on
>>> > > > > > > > > > the
>>> > > > > > > > > > >> > > client
>>> > > > > > > > > > >> > > > side (alternatively to idleTimeout that is on
>>> the
>>> > > > server
>>> > > > > > > > side).
>>> > > > > > > > > > >> > > >
>>> > > > > > > > > > >> > > > Now I understand, this feature can be helpful
>>> > then.
>>> > > > > Every
>>> > > > > > > > client
>>> > > > > > > > > > can
>>> > > > > > > > > > >> > > > configure itself in case it's possible to be
>>> idle
>>> > > > > > sometimes,
>>> > > > > > > > and
>>> > > > > > > > > > >> choose
>>> > > > > > > > > > >> > > > an appropriate timeout by itself too. And by
>>> > default
>>> > > > the
>>> > > > > > > > feature
>>> > > > > > > > > > >> should
>>> > > > > > > > > > >> > > be
>>> > > > > > > > > > >> > > > disabled.
>>> > > > > > > > > > >> > > >
>>> > > > > > > > > > >> > > > WDYT, should we add a WARN message for clients
>>> > that
>>> > > > > > > configure
>>> > > > > > > > > > >> > > > keepAliveTimeout greater than idleTimeout on
>>> the
>>> > > > server
>>> > > > > > > side?
>>> > > > > > > > > > >> > > >
>>> > > > > > > > > > >> > > >
>>> > > > > > > > > > >> > > >
>>> > > > > > > > > > >> > > > On Mon, Feb 7, 2022 at 1:05 PM Pavel Tupitsyn
>>> <
>>> > > > > > > > > > ptupitsyn@apache.org
>>> > > > > > > > > > >> >
>>> > > > > > > > > > >> > > > wrote:
>>> > > > > > > > > > >> > > >
>>> > > > > > > > > > >> > > > > Ivan,
>>> > > > > > > > > > >> > > > >
>>> > > > > > > > > > >> > > > > I suggest the following:
>>> > > > > > > > > > >> > > > >
>>> > > > > > > > > > >> > > > > 1. Server sends KEEP_ALIVE feature flag,
>>> which
>>> > > means
>>> > > > > it
>>> > > > > > > > > accepts
>>> > > > > > > > > > >> > > > > OP_KEEP_ALIVE empty message
>>> > > > > > > > > > >> > > > > 2. Client sends OP_KEEP_ALIVE when the
>>> > connection
>>> > > is
>>> > > > > > idle
>>> > > > > > > > for
>>> > > > > > > > > a
>>> > > > > > > > > > >> > > > > certain period of time
>>> > > > > > > > > > >> > > > > 3. Already implemented: when
>>> > > > > > > > > > >> ClientConnectorConfiguration#idleTimeout
>>> > > > > > > > > > >> > > is
>>> > > > > > > > > > >> > > > > not zero, server disconnects idle clients
>>> > > > > > > > > > >> > > > >
>>> > > > > > > > > > >> > > > > This way we don't need server->client
>>> > keepalives,
>>> > > as
>>> > > > > you
>>> > > > > > > > > > correctly
>>> > > > > > > > > > >> > > noted.
>>> > > > > > > > > > >> > > > >
>>> > > > > > > > > > >> > > > > On Mon, Feb 7, 2022 at 12:43 PM Ivan
>>> Daschinsky
>>> > <
>>> > > > > > > > > > >> ivandasch@gmail.com
>>> > > > > > > > > > >> > >
>>> > > > > > > > > > >> > > > > wrote:
>>> > > > > > > > > > >> > > > >
>>> > > > > > > > > > >> > > > > > Pavel, I suppose that ideally:
>>> > > > > > > > > > >> > > > > > 1. Client send in handshake flag, that it
>>> > > supports
>>> > > > > > > > > KEEP_ALIVE
>>> > > > > > > > > > >> > feature
>>> > > > > > > > > > >> > > > and
>>> > > > > > > > > > >> > > > > > server takes it into account.
>>> > > > > > > > > > >> > > > > > 2. Each request of client can be
>>> considered as
>>> > > > > > > keep-alive
>>> > > > > > > > > > ping.
>>> > > > > > > > > > >> > > > > > 3. Client send failure should be processed
>>> > using
>>> > > > > retry
>>> > > > > > > > > policy.
>>> > > > > > > > > > >> > > > > > 4. Server should not send keep-alive
>>> packets,
>>> > it
>>> > > > is
>>> > > > > > > > > redundant,
>>> > > > > > > > > > >> but
>>> > > > > > > > > > >> > > > server
>>> > > > > > > > > > >> > > > > > should track requests from client and if
>>> there
>>> > > is
>>> > > > no
>>> > > > > > > > > requests
>>> > > > > > > > > > >> from
>>> > > > > > > > > > >> > > > client
>>> > > > > > > > > > >> > > > > > with KEEP_ALIVE feature,
>>> > > > > > > > > > >> > > > > > automatically close connection and free
>>> > > resources.
>>> > > > > > > > > > >> > > > > >
>>> > > > > > > > > > >> > > > > > Similar approach is used in zookeeper
>>> clients.
>>> > > > > > > > > > >> > > > > >
>>> > > > > > > > > > >> > > > > > пн, 7 февр. 2022 г. в 12:24, Pavel
>>> Tupitsyn <
>>> > > > > > > > > > >> ptupitsyn@apache.org
>>> > > > > > > > > > >> > >:
>>> > > > > > > > > > >> > > > > >
>>> > > > > > > > > > >> > > > > > > Ivan,
>>> > > > > > > > > > >> > > > > > >
>>> > > > > > > > > > >> > > > > > > Ideally, the check should come from both
>>> > > sides.
>>> > > > > > > > > > >> > > > > > > - Client periodically sends keepalive to
>>> > > server
>>> > > > > > > > > > >> > > > > > > - Server periodically sends keepalive to
>>> > > client
>>> > > > > > > > > > >> > > > > > >
>>> > > > > > > > > > >> > > > > > > Feature flags will be added
>>> accordingly, so
>>> > it
>>> > > > is
>>> > > > > > not
>>> > > > > > > > > > >> necessary
>>> > > > > > > > > > >> > to
>>> > > > > > > > > > >> > > > > > > implement this in all thin clients.
>>> > > > > > > > > > >> > > > > > >
>>> > > > > > > > > > >> > > > > > > On Mon, Feb 7, 2022 at 11:43 AM Ivan
>>> > > Daschinsky
>>> > > > <
>>> > > > > > > > > > >> > > ivandasch@gmail.com
>>> > > > > > > > > > >> > > > >
>>> > > > > > > > > > >> > > > > > > wrote:
>>> > > > > > > > > > >> > > > > > >
>>> > > > > > > > > > >> > > > > > > > I suppose it is great idea, but this
>>> > > > > functionality
>>> > > > > > > can
>>> > > > > > > > > be
>>> > > > > > > > > > >> hard
>>> > > > > > > > > > >> > to
>>> > > > > > > > > > >> > > > > > > implement
>>> > > > > > > > > > >> > > > > > > > for some platforms. I.e. sync python
>>> > client
>>> > > or
>>> > > > > php
>>> > > > > > > > > (there
>>> > > > > > > > > > >> is no
>>> > > > > > > > > > >> > > > real
>>> > > > > > > > > > >> > > > > > > > multithreading for python (GIL) and
>>> php is
>>> > > > > single
>>> > > > > > > > > threaded
>>> > > > > > > > > > >> by
>>> > > > > > > > > > >> > > > > design).
>>> > > > > > > > > > >> > > > > > > But
>>> > > > > > > > > > >> > > > > > > > for async clients it is not very hard
>>> to
>>> > > > > > implement.
>>> > > > > > > > > > >> > Nevertheless,
>>> > > > > > > > > > >> > > > > this
>>> > > > > > > > > > >> > > > > > > > feature should be optional, because of
>>> > > > possible
>>> > > > > > > > > technical
>>> > > > > > > > > > >> > > > > limitations.
>>> > > > > > > > > > >> > > > > > > >
>>> > > > > > > > > > >> > > > > > > > Pavel, is this check mostly for client
>>> > side?
>>> > > > Or
>>> > > > > > > > servers
>>> > > > > > > > > > can
>>> > > > > > > > > > >> do
>>> > > > > > > > > > >> > > some
>>> > > > > > > > > > >> > > > > > > actions
>>> > > > > > > > > > >> > > > > > > > if there is no activity from thin
>>> client
>>> > > (i.e.
>>> > > > > > > closing
>>> > > > > > > > > > >> context
>>> > > > > > > > > > >> > > and
>>> > > > > > > > > > >> > > > > free
>>> > > > > > > > > > >> > > > > > > > resources such as queries' handles
>>> and so
>>> > > on?)
>>> > > > > > > > > > >> > > > > > > >
>>> > > > > > > > > > >> > > > > > > > пн, 7 февр. 2022 г. в 11:09, Pavel
>>> > Tupitsyn
>>> > > <
>>> > > > > > > > > > >> > > ptupitsyn@apache.org
>>> > > > > > > > > > >> > > > >:
>>> > > > > > > > > > >> > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > Hi Maksim,
>>> > > > > > > > > > >> > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > > half-state is a possible situation
>>> > when
>>> > > an
>>> > > > > > > Ignite
>>> > > > > > > > > node
>>> > > > > > > > > > >> goes
>>> > > > > > > > > > >> > > > down
>>> > > > > > > > > > >> > > > > or
>>> > > > > > > > > > >> > > > > > > > > somehow removes connection to a thin
>>> > > client
>>> > > > > > > > > > >> > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > Half-open state is also possible
>>> when,
>>> > for
>>> > > > > > > example,
>>> > > > > > > > an
>>> > > > > > > > > > >> > > > intermediate
>>> > > > > > > > > > >> > > > > > > > router
>>> > > > > > > > > > >> > > > > > > > > is rebooted [1].
>>> > > > > > > > > > >> > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > This is what we seem to have
>>> encountered
>>> > > > with
>>> > > > > > one
>>> > > > > > > of
>>> > > > > > > > > our
>>> > > > > > > > > > >> > > > customers
>>> > > > > > > > > > >> > > > > -
>>> > > > > > > > > > >> > > > > > > they
>>> > > > > > > > > > >> > > > > > > > > have a stable cluster, and
>>> long-living
>>> > > > > (multiple
>>> > > > > > > > days)
>>> > > > > > > > > > >> thin
>>> > > > > > > > > > >> > > > client
>>> > > > > > > > > > >> > > > > > > > > connections which can be idle for
>>> some
>>> > > time.
>>> > > > > > > > > > >> > > > > > > > > And only when we send some data on
>>> such
>>> > an
>>> > > > > idle
>>> > > > > > > > > > >> connection do
>>> > > > > > > > > > >> > > we
>>> > > > > > > > > > >> > > > > > > discover
>>> > > > > > > > > > >> > > > > > > > > that it is broken.
>>> > > > > > > > > > >> > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > > But with enabled (true by default)
>>> > > > > > > > > partitionAwareness
>>> > > > > > > > > > >> > feature
>>> > > > > > > > > > >> > > > > > clients
>>> > > > > > > > > > >> > > > > > > > can
>>> > > > > > > > > > >> > > > > > > > > be notified about topology changes
>>> > > > > > > > > > >> > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > Partition awareness is a "lazy"
>>> > > notification
>>> > > > > in
>>> > > > > > a
>>> > > > > > > > form
>>> > > > > > > > > > of
>>> > > > > > > > > > >> a
>>> > > > > > > > > > >> > > > > response
>>> > > > > > > > > > >> > > > > > > > > message flag [2].
>>> > > > > > > > > > >> > > > > > > > > You won't get one on an idle
>>> connection.
>>> > > > > > > > > > >> > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > > the connections are removed on the
>>> > > server
>>> > > > > side
>>> > > > > > > by
>>> > > > > > > > > > client
>>> > > > > > > > > > >> > idle
>>> > > > > > > > > > >> > > > > > timeout
>>> > > > > > > > > > >> > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > Idle timeout is disabled by default.
>>> > > > > > > > > > >> > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > > is it OK to keep such connections
>>> > alive
>>> > > > for
>>> > > > > a
>>> > > > > > > long
>>> > > > > > > > > > time
>>> > > > > > > > > > >> > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > I think it is up to the user.
>>> > > > > > > > > > >> > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > > in the case of partition awareness
>>> > > > features
>>> > > > > it
>>> > > > > > > can
>>> > > > > > > > > > lead
>>> > > > > > > > > > >> to
>>> > > > > > > > > > >> > > > > wasting
>>> > > > > > > > > > >> > > > > > > TCP
>>> > > > > > > > > > >> > > > > > > > > sockets on Ignite nodes, can't it
>>> > > > > > > > > > >> > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > Can you please elaborate?
>>> > > > > > > > > > >> > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > [1]
>>> > > > > > > > > > >> > > > > > > > >
>>> > > > > > > > > > >> > > > > > > >
>>> > > > > > > > > > >> > > > > > >
>>> > > > > > > > > > >> > > > > >
>>> > > > > > > > > > >> > > > >
>>> > > > > > > > > > >> > > >
>>> > > > > > > > > > >> > >
>>> > > > > > > > > > >> >
>>> > > > > > > > > > >>
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
>>> > > > > > > > > > >> > > > > > > > > [2]
>>> > > > > > > > > > >> > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > >
>>> > > > > > > > > > >> > > > > > > >
>>> > > > > > > > > > >> > > > > > >
>>> > > > > > > > > > >> > > > > >
>>> > > > > > > > > > >> > > > >
>>> > > > > > > > > > >> > > >
>>> > > > > > > > > > >> > >
>>> > > > > > > > > > >> >
>>> > > > > > > > > > >>
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
>>> > > > > > > > > > >> > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > On Fri, Feb 4, 2022 at 4:01 PM
>>> Maksim
>>> > > > Timonin
>>> > > > > <
>>> > > > > > > > > > >> > > > > > timoninmaxim@apache.org
>>> > > > > > > > > > >> > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > wrote:
>>> > > > > > > > > > >> > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > > Hi Pavel,
>>> > > > > > > > > > >> > > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > > Thanks for starting this thread!
>>> Can I
>>> > > ask
>>> > > > > > some
>>> > > > > > > > > > >> questions
>>> > > > > > > > > > >> > > here
>>> > > > > > > > > > >> > > > to
>>> > > > > > > > > > >> > > > > > get
>>> > > > > > > > > > >> > > > > > > > the
>>> > > > > > > > > > >> > > > > > > > > > feature more clearly?
>>> > > > > > > > > > >> > > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > > As I understand it correctly,
>>> > half-state
>>> > > > is
>>> > > > > a
>>> > > > > > > > > possible
>>> > > > > > > > > > >> > > > situation
>>> > > > > > > > > > >> > > > > > when
>>> > > > > > > > > > >> > > > > > > > an
>>> > > > > > > > > > >> > > > > > > > > > Ignite node goes down or somehow
>>> > removes
>>> > > > > > > > connection
>>> > > > > > > > > > to a
>>> > > > > > > > > > >> > thin
>>> > > > > > > > > > >> > > > > > client.
>>> > > > > > > > > > >> > > > > > > > But
>>> > > > > > > > > > >> > > > > > > > > > with enabled (true by default)
>>> > > > > > > partitionAwareness
>>> > > > > > > > > > >> feature
>>> > > > > > > > > > >> > > > clients
>>> > > > > > > > > > >> > > > > > can
>>> > > > > > > > > > >> > > > > > > > be
>>> > > > > > > > > > >> > > > > > > > > > notified about topology changes.
>>> So,
>>> > > there
>>> > > > > are
>>> > > > > > > > > > possible
>>> > > > > > > > > > >> > > cases:
>>> > > > > > > > > > >> > > > > > > > > > 1. ThinClient connects to a single
>>> > node.
>>> > > > > > > > > > >> > > > > > > > > > 2. Ignite node removes connection
>>> from
>>> > > > > itself.
>>> > > > > > > > > > >> > > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > > I like the idea for the case with
>>> a
>>> > > single
>>> > > > > > node,
>>> > > > > > > > as
>>> > > > > > > > > it
>>> > > > > > > > > > >> > helps
>>> > > > > > > > > > >> > > > fail
>>> > > > > > > > > > >> > > > > > > fast.
>>> > > > > > > > > > >> > > > > > > > > > But is it OK to connect a client
>>> to a
>>> > > > single
>>> > > > > > > node
>>> > > > > > > > > > only?
>>> > > > > > > > > > >> > > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > > For the second one: you mention
>>> that a
>>> > > > case
>>> > > > > > for
>>> > > > > > > > the
>>> > > > > > > > > > >> second
>>> > > > > > > > > > >> > > > option
>>> > > > > > > > > > >> > > > > > is
>>> > > > > > > > > > >> > > > > > > > > > "Long-living and mostly idle
>>> > connections
>>> > > > are
>>> > > > > > > > > > especially
>>> > > > > > > > > > >> > > > > susceptible
>>> > > > > > > > > > >> > > > > > > to
>>> > > > > > > > > > >> > > > > > > > > this
>>> > > > > > > > > > >> > > > > > > > > > behavior". If I understand
>>> correctly
>>> > the
>>> > > > > > > > connections
>>> > > > > > > > > > are
>>> > > > > > > > > > >> > > > removed
>>> > > > > > > > > > >> > > > > on
>>> > > > > > > > > > >> > > > > > > the
>>> > > > > > > > > > >> > > > > > > > > > server side by client idle
>>> timeout.
>>> > Can
>>> > > we
>>> > > > > > just
>>> > > > > > > > > > >> configure
>>> > > > > > > > > > >> > the
>>> > > > > > > > > > >> > > > > idle
>>> > > > > > > > > > >> > > > > > > > > timeout
>>> > > > > > > > > > >> > > > > > > > > > for cases where we really need
>>> keeping
>>> > > > alive
>>> > > > > > > idle
>>> > > > > > > > > > >> > > connections?
>>> > > > > > > > > > >> > > > > Are
>>> > > > > > > > > > >> > > > > > > > there
>>> > > > > > > > > > >> > > > > > > > > > any other cases with unexpectedly
>>> > > dropped
>>> > > > > > > > > connections?
>>> > > > > > > > > > >> > > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > > I'm wondering is it OK to keep
>>> such
>>> > > > > > connections
>>> > > > > > > > > alive
>>> > > > > > > > > > >> for a
>>> > > > > > > > > > >> > > > long
>>> > > > > > > > > > >> > > > > > > time?
>>> > > > > > > > > > >> > > > > > > > > > Also in the case of partition
>>> > awareness
>>> > > > > > features
>>> > > > > > > > it
>>> > > > > > > > > > can
>>> > > > > > > > > > >> > lead
>>> > > > > > > > > > >> > > to
>>> > > > > > > > > > >> > > > > > > wasting
>>> > > > > > > > > > >> > > > > > > > > TCP
>>> > > > > > > > > > >> > > > > > > > > > sockets on Ignite nodes, can't it?
>>> > > > > > > > > > >> > > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > > Thanks!
>>> > > > > > > > > > >> > > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > > On Thu, Feb 3, 2022 at 2:24 PM
>>> Pavel
>>> > > > > Tupitsyn
>>> > > > > > <
>>> > > > > > > > > > >> > > > > > ptupitsyn@apache.org>
>>> > > > > > > > > > >> > > > > > > > > > wrote:
>>> > > > > > > > > > >> > > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > > >> Igniters,
>>> > > > > > > > > > >> > > > > > > > > >>
>>> > > > > > > > > > >> > > > > > > > > >> Please review the proposal to add
>>> > > > heartbeat
>>> > > > > > > > > messages
>>> > > > > > > > > > to
>>> > > > > > > > > > >> > the
>>> > > > > > > > > > >> > > > thin
>>> > > > > > > > > > >> > > > > > > > client
>>> > > > > > > > > > >> > > > > > > > > >> protocol (both 2.x and 3.x) and
>>> let
>>> > me
>>> > > > know
>>> > > > > > > your
>>> > > > > > > > > > >> thoughts:
>>> > > > > > > > > > >> > > > > > > > > >>
>>> > > > > > > > > > >> > > > > > > > > >>
>>> > > > > > > > > > >> > > > > > > > > >>
>>> > > > > > > > > > >> > > > > > > > >
>>> > > > > > > > > > >> > > > > > > >
>>> > > > > > > > > > >> > > > > > >
>>> > > > > > > > > > >> > > > > >
>>> > > > > > > > > > >> > > > >
>>> > > > > > > > > > >> > > >
>>> > > > > > > > > > >> > >
>>> > > > > > > > > > >> >
>>> > > > > > > > > > >>
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
>>> > > > > > > > > > >> > > > > > > > > >>
>>> > > > > > > > > > >> > > > > > > > > >
>>> > > > > > > > > > >> > > > > > > > >
>>> > > > > > > > > > >> > > > > > > >
>>> > > > > > > > > > >> > > > > > > >
>>> > > > > > > > > > >> > > > > > > > --
>>> > > > > > > > > > >> > > > > > > > Sincerely yours, Ivan Daschinskiy
>>> > > > > > > > > > >> > > > > > > >
>>> > > > > > > > > > >> > > > > > >
>>> > > > > > > > > > >> > > > > >
>>> > > > > > > > > > >> > > > > >
>>> > > > > > > > > > >> > > > > > --
>>> > > > > > > > > > >> > > > > > Sincerely yours, Ivan Daschinskiy
>>> > > > > > > > > > >> > > > > >
>>> > > > > > > > > > >> > > > >
>>> > > > > > > > > > >> > > >
>>> > > > > > > > > > >> > >
>>> > > > > > > > > > >> >
>>> > > > > > > > > > >>
>>> > > > > > > > > > >>
>>> > > > > > > > > > >> --
>>> > > > > > > > > > >> Sincerely yours, Ivan Daschinskiy
>>> > > > > > > > > > >>
>>> > > > > > > > > > >
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > --
>>> > > > > > > Sincerely yours, Ivan Daschinskiy
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > > >
>>> > > > > --
>>> > > > > Sincerely yours, Ivan Daschinskiy
>>> > > > >
>>> > > >
>>> > >
>>> > >
>>> > > --
>>> > > Sincerely yours, Ivan Daschinskiy
>>> > >
>>> >
>>>
>>
>>
>> --
>> Sincerely yours, Ivan Daschinskiy
>>
>

-- 
Sincerely yours, Ivan Daschinskiy

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Ivan Daschinsky <iv...@gmail.com>.
Pavel, sorry, i've made mistake. But current behaviour is ok for me. This
timeout cannot be change on server side runtime. But we can simplify
protocol just use one opcode and message

вт, 15 февр. 2022 г., 14:54 Ivan Daschinsky <iv...@gmail.com>:

> > Idle timeout can't change, why send it back with every heartbeat
> response?
> May be I am wrong, but from code I see this behaviour. But if I am wrong,
> this is ok behaviour for me.
>
>
>
> вт, 15 февр. 2022 г. в 14:00, Pavel Tupitsyn <pt...@apache.org>:
>
>> Ivan, I mostly agree with your proposal, except this point:
>>
>> > Response to heartbeat request -- is idle timeout
>> Idle timeout can't change, why send it back with every heartbeat response?
>>
>> > possible cases with cluster restart, upgrade
>> In those cases, a new connection will be established, and we'll retrieve
>> the new timeout after the handshake.
>>
>>
>> On Tue, Feb 15, 2022 at 12:04 PM Maksim Timonin <ti...@apache.org>
>> wrote:
>>
>> > Hi Ivan,
>> >
>> > Cases you described sound reasonable to me. Then the client should just
>> set
>> > up the `keepAlive` flag, and it just works.
>> >
>> > So, there are 3 branches:
>> > 1. Users don't configure keepAlive at all.
>> > 2. Users configure keepAliveHeartbeatInterval (long, ms).
>> > 3. Users configure keepAlive (boolean).
>> >
>> > AFAIU, Pavel's proposal is about covering the second case only. But
>> > actually the 2nd and 3rd aren't conflicted with each other.I think for
>> both
>> > branches, a cluster should respond with idleTimeout value on every keep
>> > alive client request. Because there are possible cases with cluster
>> > restart, upgrade, etc. Clients should check every response and in case
>> of
>> > changed idleTimeout. For 2nd case write a WARN message, and for 3rd -
>> > reconfigure themself in case of changed idleTimeout.
>> >
>> >
>> >
>> >
>> > On Tue, Feb 15, 2022 at 9:51 AM Ivan Daschinsky <iv...@gmail.com>
>> > wrote:
>> >
>> > > Regarding discussion here [1]
>> > >
>> > > I suppose that this feature, despite the fact that initial intention
>> of
>> > > Pavel was different, can drastically
>> > > improve the usage pattern of thin clients and give a lot of
>> opportunities
>> > > if the following is done:
>> > >
>> > > 1. GridNioServer has a great feature -- idle timeout. If  a server did
>> > not
>> > > receive any from a client -- it will be kicked off.
>> > >     But there are some scenarios that make the use of this feature
>> > > impossible:
>> > > a. Multiple workers waiting for batch tasks and relatively low
>> requests
>> > > rate -- this services will be often kicked off and must reconnect.
>> > > In order to prevent this behaviour, the user must implement a kind of
>> > > heartbeating by himself.
>> > > b. Quite often user may want to implement leader-follower pattern for
>> > > services for HA, so followers also will be considered as idle. Kicking
>> > off
>> > > these followers
>> > > is not acceptable, so user  should also implement heartbeating by
>> > himself.
>> > >
>> > > My proposition is:
>> > > 1. Add two flags -- enable/disable heartbeats, and very optional
>> > heartbeat
>> > > timeout. Set enable to true by default, timeout to default heartbeat
>> > > timeout.
>> > > 2. If server and client both support this feature, and heartbeats are
>> not
>> > > explicitly disabled on client side:
>> > > 3. Response to heartbeat request -- is idle timeout. If idle timeout
>> is
>> > set
>> > > on the server side , set heartbeat timeout to one-third of it, instead
>> > set
>> > > to default or specified value.
>> > >
>> > > Pros:
>> > > 1. Easy to set up -- just flag on client side and just set timeout on
>> > > server side.
>> > > 2. Hard to configure improperly, i.e set heartbeat timeout not short
>> > enough
>> > > in order to prevent kicking out by server.
>> > > 3. If the user just wants heartbeats without setting idle timeout --
>> > > heartbeats are by default on and with reasonable timeout.
>> > >
>> > > Cons:
>> > > 1. If someone will rely on old behavior and just wants to drop his
>> > clients
>> > > on timeout -- this will not work without reconfiguring, he should
>> disable
>> > > heartbeats.
>> > > But I cannot even imagine that someone will find this behaviour
>> > desirable.
>> > > I strongly believe that this behaviour prevents users from using
>> > > idleTimeout on server side.
>> > >
>> > > [1] --
>> https://github.com/apache/ignite/pull/9817#discussion_r805628955
>> > >
>> > > пт, 11 февр. 2022 г. в 10:58, Pavel Tupitsyn <pt...@apache.org>:
>> > >
>> > > > I've prepared a PR, please have a look:
>> > > > https://github.com/apache/ignite/pull/9817
>> > > >
>> > > > On Mon, Feb 7, 2022 at 6:37 PM Ivan Daschinsky <ivandasch@gmail.com
>> >
>> > > > wrote:
>> > > >
>> > > > > I see potential in this feature, especially if we use something
>> like
>> > > > > continuous query. Stale clients can consume a lot of resources
>> and it
>> > > is
>> > > > > worth kick these clients out.
>> > > > >
>> > > > > пн, 7 февр. 2022 г. в 18:25, Pavel Tupitsyn <ptupitsyn@apache.org
>> >:
>> > > > >
>> > > > > > > If we use new approach, we can reduce this timeout. But this
>> can
>> > > > affect
>> > > > > > old clients.
>> > > > > >
>> > > > > > idleTimeout is disabled by default, we are not going to change
>> > this.
>> > > > > >
>> > > > > > > Also, let's think about that sending heartbeats and interval
>> of
>> > > > sending
>> > > > > > > heartbeats could be calculated on the server side (i.e. one
>> third
>> > > of
>> > > > > idle
>> > > > > > > timeout) and sent to the client during handshake.
>> > > > > > > Also we can introduce something like a negotiation mechanism
>> as
>> > in
>> > > > > > > zookeeper.
>> > > > > >
>> > > > > > I tend to agree with Maksim here, let's keep it simple and
>> > explicit.
>> > > > > > Log a warning, but don't do anything clever.
>> > > > > >
>> > > > > > On Mon, Feb 7, 2022 at 6:15 PM Ivan Daschinsky <
>> > ivandasch@gmail.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > >> idleTimeout already exists, I don't think we should change
>> the
>> > > way
>> > > > > it
>> > > > > > > works (or did I misunderstand you?)
>> > > > > > > If we use new approach, we can reduce this timeout. But this
>> can
>> > > > affect
>> > > > > > old
>> > > > > > > clients.
>> > > > > > >
>> > > > > > >
>> > > > > > > Also, let's think about that sending heartbeats and interval
>> of
>> > > > sending
>> > > > > > > heartbeats could be calculated on the server side (i.e. one
>> third
>> > > of
>> > > > > idle
>> > > > > > > timeout) and sent to the client
>> > > > > > > during handshake.
>> > > > > > > Also we can introduce something like a negotiation mechanism
>> as
>> > in
>> > > > > > > zookeeper.
>> > > > > > >
>> > > > > > >
>> > > > > > > пн, 7 февр. 2022 г. в 18:05, Pavel Tupitsyn <
>> > ptupitsyn@apache.org
>> > > >:
>> > > > > > >
>> > > > > > > > Igor,
>> > > > > > > >
>> > > > > > > > > Maybe clients should pass this information on to the
>> > handshake.
>> > > > > > > >
>> > > > > > > > Do you think we should log a mismatched timeout warning on
>> the
>> > > > > server,
>> > > > > > > not
>> > > > > > > > on the client?
>> > > > > > > > Or should we do both?
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > I've updated the proposal with OP_GET_IDLE_TIMEOUT and some
>> > other
>> > > > > > details
>> > > > > > > > discussed above.
>> > > > > > > >
>> > > > > > > > On Mon, Feb 7, 2022 at 5:42 PM Igor Sapego <
>> isapego@apache.org
>> > >
>> > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Feature seems useful for me as it makes connection
>> management
>> > > > more
>> > > > > > > robust
>> > > > > > > > > and
>> > > > > > > > > predictable.
>> > > > > > > > >
>> > > > > > > > > I agree with Pavel, that we should print warning when
>> > heartbeat
>> > > > > > period
>> > > > > > > is
>> > > > > > > > > larger than
>> > > > > > > > > idle timeout, but I see a problem here as idle timeout is
>> > > > > configured
>> > > > > > on
>> > > > > > > > > server and is not
>> > > > > > > > > known to clients, while heartbeats configured on clients
>> and
>> > > > their
>> > > > > > > period
>> > > > > > > > > is not known
>> > > > > > > > > to the server. Maybe clients should pass this information
>> on
>> > to
>> > > > the
>> > > > > > > > > handshake.
>> > > > > > > > >
>> > > > > > > > > Regarding Python and PHP clients - can not we use some
>> kind
>> > of
>> > > > > timers
>> > > > > > > to
>> > > > > > > > > implement
>> > > > > > > > > this feature?
>> > > > > > > > >
>> > > > > > > > > Best Regards,
>> > > > > > > > > Igor
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <
>> > > > > ptupitsyn@apache.org>
>> > > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Maksim, agree. Let's not be too clever and only log a
>> > > warning.
>> > > > > > > > > >
>> > > > > > > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <
>> > > > > > ptupitsyn@apache.org>
>> > > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > Ivan, idleTimeout already exists, I don't think we
>> should
>> > > > > change
>> > > > > > > the
>> > > > > > > > > way
>> > > > > > > > > > > it works (or did I misunderstand you?)
>> > > > > > > > > > >
>> > > > > > > > > > > Of course, enabling heartbeats means that otherwise
>> idle
>> > > > > clients
>> > > > > > > will
>> > > > > > > > > no
>> > > > > > > > > > > longer be disconnected by the server.
>> > > > > > > > > > > I think we should cross-link those properties in the
>> > > > > > documentation
>> > > > > > > > and
>> > > > > > > > > > > explain this behavior.
>> > > > > > > > > > >
>> > > > > > > > > > > On Mon, Feb 7, 2022 at 4:39 PM Ivan Daschinsky <
>> > > > > > > ivandasch@gmail.com>
>> > > > > > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > >> >>3. Already implemented: when
>> > > > > > > > > ClientConnectorConfiguration#idleTimeout
>> > > > > > > > > > is
>> > > > > > > > > > >> not zero, server disconnects idle clients
>> > > > > > > > > > >> >>
>> > > > > > > > > > >> But I suppose it would be great to have:
>> > > > > > > > > > >> 1. If client supports keep alive, use idleTimeout
>> > > > > > > > > > >> 2. If not, do not use it.
>> > > > > > > > > > >>
>> > > > > > > > > > >> But I am not sure if it is correct or not.
>> > > > > > > > > > >>
>> > > > > > > > > > >> пн, 7 февр. 2022 г. в 16:01, Maksim Timonin <
>> > > > > > > > timoninmaxim@apache.org
>> > > > > > > > > >:
>> > > > > > > > > > >>
>> > > > > > > > > > >> > I believe explicit is better than implicit :) Also
>> in
>> > > case
>> > > > > of
>> > > > > > > > > dynamic
>> > > > > > > > > > >> > calculation of timeout, it can change dynamically,
>> for
>> > > > > example
>> > > > > > > > > > >> restarting a
>> > > > > > > > > > >> > cluster with different configuration should
>> > reconfigure
>> > > > > > clients
>> > > > > > > > too.
>> > > > > > > > > > >> Looks
>> > > > > > > > > > >> > complicated.
>> > > > > > > > > > >> >
>> > > > > > > > > > >> > My vote for WARN + javadocs with mention of this
>> > issue.
>> > > > > > > > > > >> >
>> > > > > > > > > > >> > On Mon, Feb 7, 2022 at 3:51 PM Pavel Tupitsyn <
>> > > > > > > > ptupitsyn@apache.org
>> > > > > > > > > >
>> > > > > > > > > > >> > wrote:
>> > > > > > > > > > >> >
>> > > > > > > > > > >> > > > WDYT, should we add a WARN message for clients
>> > that
>> > > > > > > configure
>> > > > > > > > > > >> > > > keepAliveTimeout greater than idleTimeout on
>> the
>> > > > server
>> > > > > > > side?
>> > > > > > > > > > >> > >
>> > > > > > > > > > >> > > I think we should either log a WARN, or retrieve
>> > > > > idleTimeout
>> > > > > > > > from
>> > > > > > > > > > >> server
>> > > > > > > > > > >> > > and configure heartbeatTimeout accordingly (e.g.
>> > > divide
>> > > > by
>> > > > > > 2).
>> > > > > > > > > > >> > > Thoughts?
>> > > > > > > > > > >> > >
>> > > > > > > > > > >> > > On Mon, Feb 7, 2022 at 3:14 PM Maksim Timonin <
>> > > > > > > > > > >> timoninmaxim@apache.org>
>> > > > > > > > > > >> > > wrote:
>> > > > > > > > > > >> > >
>> > > > > > > > > > >> > > > Hi Pavel,
>> > > > > > > > > > >> > > >
>> > > > > > > > > > >> > > > Thanks for the links. Yes, I forgot that the
>> flag
>> > of
>> > > > > > changed
>> > > > > > > > > > >> topology
>> > > > > > > > > > >> > is
>> > > > > > > > > > >> > > > lazy. Also I missed that the keepAlive setting
>> is
>> > > > > > configured
>> > > > > > > > on
>> > > > > > > > > > the
>> > > > > > > > > > >> > > client
>> > > > > > > > > > >> > > > side (alternatively to idleTimeout that is on
>> the
>> > > > server
>> > > > > > > > side).
>> > > > > > > > > > >> > > >
>> > > > > > > > > > >> > > > Now I understand, this feature can be helpful
>> > then.
>> > > > > Every
>> > > > > > > > client
>> > > > > > > > > > can
>> > > > > > > > > > >> > > > configure itself in case it's possible to be
>> idle
>> > > > > > sometimes,
>> > > > > > > > and
>> > > > > > > > > > >> choose
>> > > > > > > > > > >> > > > an appropriate timeout by itself too. And by
>> > default
>> > > > the
>> > > > > > > > feature
>> > > > > > > > > > >> should
>> > > > > > > > > > >> > > be
>> > > > > > > > > > >> > > > disabled.
>> > > > > > > > > > >> > > >
>> > > > > > > > > > >> > > > WDYT, should we add a WARN message for clients
>> > that
>> > > > > > > configure
>> > > > > > > > > > >> > > > keepAliveTimeout greater than idleTimeout on
>> the
>> > > > server
>> > > > > > > side?
>> > > > > > > > > > >> > > >
>> > > > > > > > > > >> > > >
>> > > > > > > > > > >> > > >
>> > > > > > > > > > >> > > > On Mon, Feb 7, 2022 at 1:05 PM Pavel Tupitsyn <
>> > > > > > > > > > ptupitsyn@apache.org
>> > > > > > > > > > >> >
>> > > > > > > > > > >> > > > wrote:
>> > > > > > > > > > >> > > >
>> > > > > > > > > > >> > > > > Ivan,
>> > > > > > > > > > >> > > > >
>> > > > > > > > > > >> > > > > I suggest the following:
>> > > > > > > > > > >> > > > >
>> > > > > > > > > > >> > > > > 1. Server sends KEEP_ALIVE feature flag,
>> which
>> > > means
>> > > > > it
>> > > > > > > > > accepts
>> > > > > > > > > > >> > > > > OP_KEEP_ALIVE empty message
>> > > > > > > > > > >> > > > > 2. Client sends OP_KEEP_ALIVE when the
>> > connection
>> > > is
>> > > > > > idle
>> > > > > > > > for
>> > > > > > > > > a
>> > > > > > > > > > >> > > > > certain period of time
>> > > > > > > > > > >> > > > > 3. Already implemented: when
>> > > > > > > > > > >> ClientConnectorConfiguration#idleTimeout
>> > > > > > > > > > >> > > is
>> > > > > > > > > > >> > > > > not zero, server disconnects idle clients
>> > > > > > > > > > >> > > > >
>> > > > > > > > > > >> > > > > This way we don't need server->client
>> > keepalives,
>> > > as
>> > > > > you
>> > > > > > > > > > correctly
>> > > > > > > > > > >> > > noted.
>> > > > > > > > > > >> > > > >
>> > > > > > > > > > >> > > > > On Mon, Feb 7, 2022 at 12:43 PM Ivan
>> Daschinsky
>> > <
>> > > > > > > > > > >> ivandasch@gmail.com
>> > > > > > > > > > >> > >
>> > > > > > > > > > >> > > > > wrote:
>> > > > > > > > > > >> > > > >
>> > > > > > > > > > >> > > > > > Pavel, I suppose that ideally:
>> > > > > > > > > > >> > > > > > 1. Client send in handshake flag, that it
>> > > supports
>> > > > > > > > > KEEP_ALIVE
>> > > > > > > > > > >> > feature
>> > > > > > > > > > >> > > > and
>> > > > > > > > > > >> > > > > > server takes it into account.
>> > > > > > > > > > >> > > > > > 2. Each request of client can be
>> considered as
>> > > > > > > keep-alive
>> > > > > > > > > > ping.
>> > > > > > > > > > >> > > > > > 3. Client send failure should be processed
>> > using
>> > > > > retry
>> > > > > > > > > policy.
>> > > > > > > > > > >> > > > > > 4. Server should not send keep-alive
>> packets,
>> > it
>> > > > is
>> > > > > > > > > redundant,
>> > > > > > > > > > >> but
>> > > > > > > > > > >> > > > server
>> > > > > > > > > > >> > > > > > should track requests from client and if
>> there
>> > > is
>> > > > no
>> > > > > > > > > requests
>> > > > > > > > > > >> from
>> > > > > > > > > > >> > > > client
>> > > > > > > > > > >> > > > > > with KEEP_ALIVE feature,
>> > > > > > > > > > >> > > > > > automatically close connection and free
>> > > resources.
>> > > > > > > > > > >> > > > > >
>> > > > > > > > > > >> > > > > > Similar approach is used in zookeeper
>> clients.
>> > > > > > > > > > >> > > > > >
>> > > > > > > > > > >> > > > > > пн, 7 февр. 2022 г. в 12:24, Pavel
>> Tupitsyn <
>> > > > > > > > > > >> ptupitsyn@apache.org
>> > > > > > > > > > >> > >:
>> > > > > > > > > > >> > > > > >
>> > > > > > > > > > >> > > > > > > Ivan,
>> > > > > > > > > > >> > > > > > >
>> > > > > > > > > > >> > > > > > > Ideally, the check should come from both
>> > > sides.
>> > > > > > > > > > >> > > > > > > - Client periodically sends keepalive to
>> > > server
>> > > > > > > > > > >> > > > > > > - Server periodically sends keepalive to
>> > > client
>> > > > > > > > > > >> > > > > > >
>> > > > > > > > > > >> > > > > > > Feature flags will be added accordingly,
>> so
>> > it
>> > > > is
>> > > > > > not
>> > > > > > > > > > >> necessary
>> > > > > > > > > > >> > to
>> > > > > > > > > > >> > > > > > > implement this in all thin clients.
>> > > > > > > > > > >> > > > > > >
>> > > > > > > > > > >> > > > > > > On Mon, Feb 7, 2022 at 11:43 AM Ivan
>> > > Daschinsky
>> > > > <
>> > > > > > > > > > >> > > ivandasch@gmail.com
>> > > > > > > > > > >> > > > >
>> > > > > > > > > > >> > > > > > > wrote:
>> > > > > > > > > > >> > > > > > >
>> > > > > > > > > > >> > > > > > > > I suppose it is great idea, but this
>> > > > > functionality
>> > > > > > > can
>> > > > > > > > > be
>> > > > > > > > > > >> hard
>> > > > > > > > > > >> > to
>> > > > > > > > > > >> > > > > > > implement
>> > > > > > > > > > >> > > > > > > > for some platforms. I.e. sync python
>> > client
>> > > or
>> > > > > php
>> > > > > > > > > (there
>> > > > > > > > > > >> is no
>> > > > > > > > > > >> > > > real
>> > > > > > > > > > >> > > > > > > > multithreading for python (GIL) and
>> php is
>> > > > > single
>> > > > > > > > > threaded
>> > > > > > > > > > >> by
>> > > > > > > > > > >> > > > > design).
>> > > > > > > > > > >> > > > > > > But
>> > > > > > > > > > >> > > > > > > > for async clients it is not very hard
>> to
>> > > > > > implement.
>> > > > > > > > > > >> > Nevertheless,
>> > > > > > > > > > >> > > > > this
>> > > > > > > > > > >> > > > > > > > feature should be optional, because of
>> > > > possible
>> > > > > > > > > technical
>> > > > > > > > > > >> > > > > limitations.
>> > > > > > > > > > >> > > > > > > >
>> > > > > > > > > > >> > > > > > > > Pavel, is this check mostly for client
>> > side?
>> > > > Or
>> > > > > > > > servers
>> > > > > > > > > > can
>> > > > > > > > > > >> do
>> > > > > > > > > > >> > > some
>> > > > > > > > > > >> > > > > > > actions
>> > > > > > > > > > >> > > > > > > > if there is no activity from thin
>> client
>> > > (i.e.
>> > > > > > > closing
>> > > > > > > > > > >> context
>> > > > > > > > > > >> > > and
>> > > > > > > > > > >> > > > > free
>> > > > > > > > > > >> > > > > > > > resources such as queries' handles and
>> so
>> > > on?)
>> > > > > > > > > > >> > > > > > > >
>> > > > > > > > > > >> > > > > > > > пн, 7 февр. 2022 г. в 11:09, Pavel
>> > Tupitsyn
>> > > <
>> > > > > > > > > > >> > > ptupitsyn@apache.org
>> > > > > > > > > > >> > > > >:
>> > > > > > > > > > >> > > > > > > >
>> > > > > > > > > > >> > > > > > > > > Hi Maksim,
>> > > > > > > > > > >> > > > > > > > >
>> > > > > > > > > > >> > > > > > > > >
>> > > > > > > > > > >> > > > > > > > > > half-state is a possible situation
>> > when
>> > > an
>> > > > > > > Ignite
>> > > > > > > > > node
>> > > > > > > > > > >> goes
>> > > > > > > > > > >> > > > down
>> > > > > > > > > > >> > > > > or
>> > > > > > > > > > >> > > > > > > > > somehow removes connection to a thin
>> > > client
>> > > > > > > > > > >> > > > > > > > >
>> > > > > > > > > > >> > > > > > > > > Half-open state is also possible
>> when,
>> > for
>> > > > > > > example,
>> > > > > > > > an
>> > > > > > > > > > >> > > > intermediate
>> > > > > > > > > > >> > > > > > > > router
>> > > > > > > > > > >> > > > > > > > > is rebooted [1].
>> > > > > > > > > > >> > > > > > > > >
>> > > > > > > > > > >> > > > > > > > > This is what we seem to have
>> encountered
>> > > > with
>> > > > > > one
>> > > > > > > of
>> > > > > > > > > our
>> > > > > > > > > > >> > > > customers
>> > > > > > > > > > >> > > > > -
>> > > > > > > > > > >> > > > > > > they
>> > > > > > > > > > >> > > > > > > > > have a stable cluster, and
>> long-living
>> > > > > (multiple
>> > > > > > > > days)
>> > > > > > > > > > >> thin
>> > > > > > > > > > >> > > > client
>> > > > > > > > > > >> > > > > > > > > connections which can be idle for
>> some
>> > > time.
>> > > > > > > > > > >> > > > > > > > > And only when we send some data on
>> such
>> > an
>> > > > > idle
>> > > > > > > > > > >> connection do
>> > > > > > > > > > >> > > we
>> > > > > > > > > > >> > > > > > > discover
>> > > > > > > > > > >> > > > > > > > > that it is broken.
>> > > > > > > > > > >> > > > > > > > >
>> > > > > > > > > > >> > > > > > > > >
>> > > > > > > > > > >> > > > > > > > > > But with enabled (true by default)
>> > > > > > > > > partitionAwareness
>> > > > > > > > > > >> > feature
>> > > > > > > > > > >> > > > > > clients
>> > > > > > > > > > >> > > > > > > > can
>> > > > > > > > > > >> > > > > > > > > be notified about topology changes
>> > > > > > > > > > >> > > > > > > > >
>> > > > > > > > > > >> > > > > > > > > Partition awareness is a "lazy"
>> > > notification
>> > > > > in
>> > > > > > a
>> > > > > > > > form
>> > > > > > > > > > of
>> > > > > > > > > > >> a
>> > > > > > > > > > >> > > > > response
>> > > > > > > > > > >> > > > > > > > > message flag [2].
>> > > > > > > > > > >> > > > > > > > > You won't get one on an idle
>> connection.
>> > > > > > > > > > >> > > > > > > > >
>> > > > > > > > > > >> > > > > > > > >
>> > > > > > > > > > >> > > > > > > > > > the connections are removed on the
>> > > server
>> > > > > side
>> > > > > > > by
>> > > > > > > > > > client
>> > > > > > > > > > >> > idle
>> > > > > > > > > > >> > > > > > timeout
>> > > > > > > > > > >> > > > > > > > >
>> > > > > > > > > > >> > > > > > > > > Idle timeout is disabled by default.
>> > > > > > > > > > >> > > > > > > > >
>> > > > > > > > > > >> > > > > > > > >
>> > > > > > > > > > >> > > > > > > > > > is it OK to keep such connections
>> > alive
>> > > > for
>> > > > > a
>> > > > > > > long
>> > > > > > > > > > time
>> > > > > > > > > > >> > > > > > > > >
>> > > > > > > > > > >> > > > > > > > > I think it is up to the user.
>> > > > > > > > > > >> > > > > > > > >
>> > > > > > > > > > >> > > > > > > > >
>> > > > > > > > > > >> > > > > > > > > > in the case of partition awareness
>> > > > features
>> > > > > it
>> > > > > > > can
>> > > > > > > > > > lead
>> > > > > > > > > > >> to
>> > > > > > > > > > >> > > > > wasting
>> > > > > > > > > > >> > > > > > > TCP
>> > > > > > > > > > >> > > > > > > > > sockets on Ignite nodes, can't it
>> > > > > > > > > > >> > > > > > > > >
>> > > > > > > > > > >> > > > > > > > > Can you please elaborate?
>> > > > > > > > > > >> > > > > > > > >
>> > > > > > > > > > >> > > > > > > > >
>> > > > > > > > > > >> > > > > > > > > [1]
>> > > > > > > > > > >> > > > > > > > >
>> > > > > > > > > > >> > > > > > > >
>> > > > > > > > > > >> > > > > > >
>> > > > > > > > > > >> > > > > >
>> > > > > > > > > > >> > > > >
>> > > > > > > > > > >> > > >
>> > > > > > > > > > >> > >
>> > > > > > > > > > >> >
>> > > > > > > > > > >>
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
>> > > > > > > > > > >> > > > > > > > > [2]
>> > > > > > > > > > >> > > > > > > > >
>> > > > > > > > > > >> > > > > > > > >
>> > > > > > > > > > >> > > > > > > >
>> > > > > > > > > > >> > > > > > >
>> > > > > > > > > > >> > > > > >
>> > > > > > > > > > >> > > > >
>> > > > > > > > > > >> > > >
>> > > > > > > > > > >> > >
>> > > > > > > > > > >> >
>> > > > > > > > > > >>
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
>> > > > > > > > > > >> > > > > > > > >
>> > > > > > > > > > >> > > > > > > > > On Fri, Feb 4, 2022 at 4:01 PM Maksim
>> > > > Timonin
>> > > > > <
>> > > > > > > > > > >> > > > > > timoninmaxim@apache.org
>> > > > > > > > > > >> > > > > > > >
>> > > > > > > > > > >> > > > > > > > > wrote:
>> > > > > > > > > > >> > > > > > > > >
>> > > > > > > > > > >> > > > > > > > > > Hi Pavel,
>> > > > > > > > > > >> > > > > > > > > >
>> > > > > > > > > > >> > > > > > > > > > Thanks for starting this thread!
>> Can I
>> > > ask
>> > > > > > some
>> > > > > > > > > > >> questions
>> > > > > > > > > > >> > > here
>> > > > > > > > > > >> > > > to
>> > > > > > > > > > >> > > > > > get
>> > > > > > > > > > >> > > > > > > > the
>> > > > > > > > > > >> > > > > > > > > > feature more clearly?
>> > > > > > > > > > >> > > > > > > > > >
>> > > > > > > > > > >> > > > > > > > > > As I understand it correctly,
>> > half-state
>> > > > is
>> > > > > a
>> > > > > > > > > possible
>> > > > > > > > > > >> > > > situation
>> > > > > > > > > > >> > > > > > when
>> > > > > > > > > > >> > > > > > > > an
>> > > > > > > > > > >> > > > > > > > > > Ignite node goes down or somehow
>> > removes
>> > > > > > > > connection
>> > > > > > > > > > to a
>> > > > > > > > > > >> > thin
>> > > > > > > > > > >> > > > > > client.
>> > > > > > > > > > >> > > > > > > > But
>> > > > > > > > > > >> > > > > > > > > > with enabled (true by default)
>> > > > > > > partitionAwareness
>> > > > > > > > > > >> feature
>> > > > > > > > > > >> > > > clients
>> > > > > > > > > > >> > > > > > can
>> > > > > > > > > > >> > > > > > > > be
>> > > > > > > > > > >> > > > > > > > > > notified about topology changes.
>> So,
>> > > there
>> > > > > are
>> > > > > > > > > > possible
>> > > > > > > > > > >> > > cases:
>> > > > > > > > > > >> > > > > > > > > > 1. ThinClient connects to a single
>> > node.
>> > > > > > > > > > >> > > > > > > > > > 2. Ignite node removes connection
>> from
>> > > > > itself.
>> > > > > > > > > > >> > > > > > > > > >
>> > > > > > > > > > >> > > > > > > > > > I like the idea for the case with a
>> > > single
>> > > > > > node,
>> > > > > > > > as
>> > > > > > > > > it
>> > > > > > > > > > >> > helps
>> > > > > > > > > > >> > > > fail
>> > > > > > > > > > >> > > > > > > fast.
>> > > > > > > > > > >> > > > > > > > > > But is it OK to connect a client
>> to a
>> > > > single
>> > > > > > > node
>> > > > > > > > > > only?
>> > > > > > > > > > >> > > > > > > > > >
>> > > > > > > > > > >> > > > > > > > > > For the second one: you mention
>> that a
>> > > > case
>> > > > > > for
>> > > > > > > > the
>> > > > > > > > > > >> second
>> > > > > > > > > > >> > > > option
>> > > > > > > > > > >> > > > > > is
>> > > > > > > > > > >> > > > > > > > > > "Long-living and mostly idle
>> > connections
>> > > > are
>> > > > > > > > > > especially
>> > > > > > > > > > >> > > > > susceptible
>> > > > > > > > > > >> > > > > > > to
>> > > > > > > > > > >> > > > > > > > > this
>> > > > > > > > > > >> > > > > > > > > > behavior". If I understand
>> correctly
>> > the
>> > > > > > > > connections
>> > > > > > > > > > are
>> > > > > > > > > > >> > > > removed
>> > > > > > > > > > >> > > > > on
>> > > > > > > > > > >> > > > > > > the
>> > > > > > > > > > >> > > > > > > > > > server side by client idle timeout.
>> > Can
>> > > we
>> > > > > > just
>> > > > > > > > > > >> configure
>> > > > > > > > > > >> > the
>> > > > > > > > > > >> > > > > idle
>> > > > > > > > > > >> > > > > > > > > timeout
>> > > > > > > > > > >> > > > > > > > > > for cases where we really need
>> keeping
>> > > > alive
>> > > > > > > idle
>> > > > > > > > > > >> > > connections?
>> > > > > > > > > > >> > > > > Are
>> > > > > > > > > > >> > > > > > > > there
>> > > > > > > > > > >> > > > > > > > > > any other cases with unexpectedly
>> > > dropped
>> > > > > > > > > connections?
>> > > > > > > > > > >> > > > > > > > > >
>> > > > > > > > > > >> > > > > > > > > > I'm wondering is it OK to keep such
>> > > > > > connections
>> > > > > > > > > alive
>> > > > > > > > > > >> for a
>> > > > > > > > > > >> > > > long
>> > > > > > > > > > >> > > > > > > time?
>> > > > > > > > > > >> > > > > > > > > > Also in the case of partition
>> > awareness
>> > > > > > features
>> > > > > > > > it
>> > > > > > > > > > can
>> > > > > > > > > > >> > lead
>> > > > > > > > > > >> > > to
>> > > > > > > > > > >> > > > > > > wasting
>> > > > > > > > > > >> > > > > > > > > TCP
>> > > > > > > > > > >> > > > > > > > > > sockets on Ignite nodes, can't it?
>> > > > > > > > > > >> > > > > > > > > >
>> > > > > > > > > > >> > > > > > > > > > Thanks!
>> > > > > > > > > > >> > > > > > > > > >
>> > > > > > > > > > >> > > > > > > > > >
>> > > > > > > > > > >> > > > > > > > > >
>> > > > > > > > > > >> > > > > > > > > >
>> > > > > > > > > > >> > > > > > > > > >
>> > > > > > > > > > >> > > > > > > > > >
>> > > > > > > > > > >> > > > > > > > > > On Thu, Feb 3, 2022 at 2:24 PM
>> Pavel
>> > > > > Tupitsyn
>> > > > > > <
>> > > > > > > > > > >> > > > > > ptupitsyn@apache.org>
>> > > > > > > > > > >> > > > > > > > > > wrote:
>> > > > > > > > > > >> > > > > > > > > >
>> > > > > > > > > > >> > > > > > > > > >> Igniters,
>> > > > > > > > > > >> > > > > > > > > >>
>> > > > > > > > > > >> > > > > > > > > >> Please review the proposal to add
>> > > > heartbeat
>> > > > > > > > > messages
>> > > > > > > > > > to
>> > > > > > > > > > >> > the
>> > > > > > > > > > >> > > > thin
>> > > > > > > > > > >> > > > > > > > client
>> > > > > > > > > > >> > > > > > > > > >> protocol (both 2.x and 3.x) and
>> let
>> > me
>> > > > know
>> > > > > > > your
>> > > > > > > > > > >> thoughts:
>> > > > > > > > > > >> > > > > > > > > >>
>> > > > > > > > > > >> > > > > > > > > >>
>> > > > > > > > > > >> > > > > > > > > >>
>> > > > > > > > > > >> > > > > > > > >
>> > > > > > > > > > >> > > > > > > >
>> > > > > > > > > > >> > > > > > >
>> > > > > > > > > > >> > > > > >
>> > > > > > > > > > >> > > > >
>> > > > > > > > > > >> > > >
>> > > > > > > > > > >> > >
>> > > > > > > > > > >> >
>> > > > > > > > > > >>
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
>> > > > > > > > > > >> > > > > > > > > >>
>> > > > > > > > > > >> > > > > > > > > >
>> > > > > > > > > > >> > > > > > > > >
>> > > > > > > > > > >> > > > > > > >
>> > > > > > > > > > >> > > > > > > >
>> > > > > > > > > > >> > > > > > > > --
>> > > > > > > > > > >> > > > > > > > Sincerely yours, Ivan Daschinskiy
>> > > > > > > > > > >> > > > > > > >
>> > > > > > > > > > >> > > > > > >
>> > > > > > > > > > >> > > > > >
>> > > > > > > > > > >> > > > > >
>> > > > > > > > > > >> > > > > > --
>> > > > > > > > > > >> > > > > > Sincerely yours, Ivan Daschinskiy
>> > > > > > > > > > >> > > > > >
>> > > > > > > > > > >> > > > >
>> > > > > > > > > > >> > > >
>> > > > > > > > > > >> > >
>> > > > > > > > > > >> >
>> > > > > > > > > > >>
>> > > > > > > > > > >>
>> > > > > > > > > > >> --
>> > > > > > > > > > >> Sincerely yours, Ivan Daschinskiy
>> > > > > > > > > > >>
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > > Sincerely yours, Ivan Daschinskiy
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Sincerely yours, Ivan Daschinskiy
>> > > > >
>> > > >
>> > >
>> > >
>> > > --
>> > > Sincerely yours, Ivan Daschinskiy
>> > >
>> >
>>
>
>
> --
> Sincerely yours, Ivan Daschinskiy
>

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Ivan Daschinsky <iv...@gmail.com>.
> Idle timeout can't change, why send it back with every heartbeat response?
May be I am wrong, but from code I see this behaviour. But if I am wrong,
this is ok behaviour for me.



вт, 15 февр. 2022 г. в 14:00, Pavel Tupitsyn <pt...@apache.org>:

> Ivan, I mostly agree with your proposal, except this point:
>
> > Response to heartbeat request -- is idle timeout
> Idle timeout can't change, why send it back with every heartbeat response?
>
> > possible cases with cluster restart, upgrade
> In those cases, a new connection will be established, and we'll retrieve
> the new timeout after the handshake.
>
>
> On Tue, Feb 15, 2022 at 12:04 PM Maksim Timonin <ti...@apache.org>
> wrote:
>
> > Hi Ivan,
> >
> > Cases you described sound reasonable to me. Then the client should just
> set
> > up the `keepAlive` flag, and it just works.
> >
> > So, there are 3 branches:
> > 1. Users don't configure keepAlive at all.
> > 2. Users configure keepAliveHeartbeatInterval (long, ms).
> > 3. Users configure keepAlive (boolean).
> >
> > AFAIU, Pavel's proposal is about covering the second case only. But
> > actually the 2nd and 3rd aren't conflicted with each other.I think for
> both
> > branches, a cluster should respond with idleTimeout value on every keep
> > alive client request. Because there are possible cases with cluster
> > restart, upgrade, etc. Clients should check every response and in case of
> > changed idleTimeout. For 2nd case write a WARN message, and for 3rd -
> > reconfigure themself in case of changed idleTimeout.
> >
> >
> >
> >
> > On Tue, Feb 15, 2022 at 9:51 AM Ivan Daschinsky <iv...@gmail.com>
> > wrote:
> >
> > > Regarding discussion here [1]
> > >
> > > I suppose that this feature, despite the fact that initial intention of
> > > Pavel was different, can drastically
> > > improve the usage pattern of thin clients and give a lot of
> opportunities
> > > if the following is done:
> > >
> > > 1. GridNioServer has a great feature -- idle timeout. If  a server did
> > not
> > > receive any from a client -- it will be kicked off.
> > >     But there are some scenarios that make the use of this feature
> > > impossible:
> > > a. Multiple workers waiting for batch tasks and relatively low requests
> > > rate -- this services will be often kicked off and must reconnect.
> > > In order to prevent this behaviour, the user must implement a kind of
> > > heartbeating by himself.
> > > b. Quite often user may want to implement leader-follower pattern for
> > > services for HA, so followers also will be considered as idle. Kicking
> > off
> > > these followers
> > > is not acceptable, so user  should also implement heartbeating by
> > himself.
> > >
> > > My proposition is:
> > > 1. Add two flags -- enable/disable heartbeats, and very optional
> > heartbeat
> > > timeout. Set enable to true by default, timeout to default heartbeat
> > > timeout.
> > > 2. If server and client both support this feature, and heartbeats are
> not
> > > explicitly disabled on client side:
> > > 3. Response to heartbeat request -- is idle timeout. If idle timeout is
> > set
> > > on the server side , set heartbeat timeout to one-third of it, instead
> > set
> > > to default or specified value.
> > >
> > > Pros:
> > > 1. Easy to set up -- just flag on client side and just set timeout on
> > > server side.
> > > 2. Hard to configure improperly, i.e set heartbeat timeout not short
> > enough
> > > in order to prevent kicking out by server.
> > > 3. If the user just wants heartbeats without setting idle timeout --
> > > heartbeats are by default on and with reasonable timeout.
> > >
> > > Cons:
> > > 1. If someone will rely on old behavior and just wants to drop his
> > clients
> > > on timeout -- this will not work without reconfiguring, he should
> disable
> > > heartbeats.
> > > But I cannot even imagine that someone will find this behaviour
> > desirable.
> > > I strongly believe that this behaviour prevents users from using
> > > idleTimeout on server side.
> > >
> > > [1] --
> https://github.com/apache/ignite/pull/9817#discussion_r805628955
> > >
> > > пт, 11 февр. 2022 г. в 10:58, Pavel Tupitsyn <pt...@apache.org>:
> > >
> > > > I've prepared a PR, please have a look:
> > > > https://github.com/apache/ignite/pull/9817
> > > >
> > > > On Mon, Feb 7, 2022 at 6:37 PM Ivan Daschinsky <iv...@gmail.com>
> > > > wrote:
> > > >
> > > > > I see potential in this feature, especially if we use something
> like
> > > > > continuous query. Stale clients can consume a lot of resources and
> it
> > > is
> > > > > worth kick these clients out.
> > > > >
> > > > > пн, 7 февр. 2022 г. в 18:25, Pavel Tupitsyn <ptupitsyn@apache.org
> >:
> > > > >
> > > > > > > If we use new approach, we can reduce this timeout. But this
> can
> > > > affect
> > > > > > old clients.
> > > > > >
> > > > > > idleTimeout is disabled by default, we are not going to change
> > this.
> > > > > >
> > > > > > > Also, let's think about that sending heartbeats and interval of
> > > > sending
> > > > > > > heartbeats could be calculated on the server side (i.e. one
> third
> > > of
> > > > > idle
> > > > > > > timeout) and sent to the client during handshake.
> > > > > > > Also we can introduce something like a negotiation mechanism as
> > in
> > > > > > > zookeeper.
> > > > > >
> > > > > > I tend to agree with Maksim here, let's keep it simple and
> > explicit.
> > > > > > Log a warning, but don't do anything clever.
> > > > > >
> > > > > > On Mon, Feb 7, 2022 at 6:15 PM Ivan Daschinsky <
> > ivandasch@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > >> idleTimeout already exists, I don't think we should change
> the
> > > way
> > > > > it
> > > > > > > works (or did I misunderstand you?)
> > > > > > > If we use new approach, we can reduce this timeout. But this
> can
> > > > affect
> > > > > > old
> > > > > > > clients.
> > > > > > >
> > > > > > >
> > > > > > > Also, let's think about that sending heartbeats and interval of
> > > > sending
> > > > > > > heartbeats could be calculated on the server side (i.e. one
> third
> > > of
> > > > > idle
> > > > > > > timeout) and sent to the client
> > > > > > > during handshake.
> > > > > > > Also we can introduce something like a negotiation mechanism as
> > in
> > > > > > > zookeeper.
> > > > > > >
> > > > > > >
> > > > > > > пн, 7 февр. 2022 г. в 18:05, Pavel Tupitsyn <
> > ptupitsyn@apache.org
> > > >:
> > > > > > >
> > > > > > > > Igor,
> > > > > > > >
> > > > > > > > > Maybe clients should pass this information on to the
> > handshake.
> > > > > > > >
> > > > > > > > Do you think we should log a mismatched timeout warning on
> the
> > > > > server,
> > > > > > > not
> > > > > > > > on the client?
> > > > > > > > Or should we do both?
> > > > > > > >
> > > > > > > >
> > > > > > > > I've updated the proposal with OP_GET_IDLE_TIMEOUT and some
> > other
> > > > > > details
> > > > > > > > discussed above.
> > > > > > > >
> > > > > > > > On Mon, Feb 7, 2022 at 5:42 PM Igor Sapego <
> isapego@apache.org
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Feature seems useful for me as it makes connection
> management
> > > > more
> > > > > > > robust
> > > > > > > > > and
> > > > > > > > > predictable.
> > > > > > > > >
> > > > > > > > > I agree with Pavel, that we should print warning when
> > heartbeat
> > > > > > period
> > > > > > > is
> > > > > > > > > larger than
> > > > > > > > > idle timeout, but I see a problem here as idle timeout is
> > > > > configured
> > > > > > on
> > > > > > > > > server and is not
> > > > > > > > > known to clients, while heartbeats configured on clients
> and
> > > > their
> > > > > > > period
> > > > > > > > > is not known
> > > > > > > > > to the server. Maybe clients should pass this information
> on
> > to
> > > > the
> > > > > > > > > handshake.
> > > > > > > > >
> > > > > > > > > Regarding Python and PHP clients - can not we use some kind
> > of
> > > > > timers
> > > > > > > to
> > > > > > > > > implement
> > > > > > > > > this feature?
> > > > > > > > >
> > > > > > > > > Best Regards,
> > > > > > > > > Igor
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <
> > > > > ptupitsyn@apache.org>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Maksim, agree. Let's not be too clever and only log a
> > > warning.
> > > > > > > > > >
> > > > > > > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <
> > > > > > ptupitsyn@apache.org>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Ivan, idleTimeout already exists, I don't think we
> should
> > > > > change
> > > > > > > the
> > > > > > > > > way
> > > > > > > > > > > it works (or did I misunderstand you?)
> > > > > > > > > > >
> > > > > > > > > > > Of course, enabling heartbeats means that otherwise
> idle
> > > > > clients
> > > > > > > will
> > > > > > > > > no
> > > > > > > > > > > longer be disconnected by the server.
> > > > > > > > > > > I think we should cross-link those properties in the
> > > > > > documentation
> > > > > > > > and
> > > > > > > > > > > explain this behavior.
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Feb 7, 2022 at 4:39 PM Ivan Daschinsky <
> > > > > > > ivandasch@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >> >>3. Already implemented: when
> > > > > > > > > ClientConnectorConfiguration#idleTimeout
> > > > > > > > > > is
> > > > > > > > > > >> not zero, server disconnects idle clients
> > > > > > > > > > >> >>
> > > > > > > > > > >> But I suppose it would be great to have:
> > > > > > > > > > >> 1. If client supports keep alive, use idleTimeout
> > > > > > > > > > >> 2. If not, do not use it.
> > > > > > > > > > >>
> > > > > > > > > > >> But I am not sure if it is correct or not.
> > > > > > > > > > >>
> > > > > > > > > > >> пн, 7 февр. 2022 г. в 16:01, Maksim Timonin <
> > > > > > > > timoninmaxim@apache.org
> > > > > > > > > >:
> > > > > > > > > > >>
> > > > > > > > > > >> > I believe explicit is better than implicit :) Also
> in
> > > case
> > > > > of
> > > > > > > > > dynamic
> > > > > > > > > > >> > calculation of timeout, it can change dynamically,
> for
> > > > > example
> > > > > > > > > > >> restarting a
> > > > > > > > > > >> > cluster with different configuration should
> > reconfigure
> > > > > > clients
> > > > > > > > too.
> > > > > > > > > > >> Looks
> > > > > > > > > > >> > complicated.
> > > > > > > > > > >> >
> > > > > > > > > > >> > My vote for WARN + javadocs with mention of this
> > issue.
> > > > > > > > > > >> >
> > > > > > > > > > >> > On Mon, Feb 7, 2022 at 3:51 PM Pavel Tupitsyn <
> > > > > > > > ptupitsyn@apache.org
> > > > > > > > > >
> > > > > > > > > > >> > wrote:
> > > > > > > > > > >> >
> > > > > > > > > > >> > > > WDYT, should we add a WARN message for clients
> > that
> > > > > > > configure
> > > > > > > > > > >> > > > keepAliveTimeout greater than idleTimeout on the
> > > > server
> > > > > > > side?
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > I think we should either log a WARN, or retrieve
> > > > > idleTimeout
> > > > > > > > from
> > > > > > > > > > >> server
> > > > > > > > > > >> > > and configure heartbeatTimeout accordingly (e.g.
> > > divide
> > > > by
> > > > > > 2).
> > > > > > > > > > >> > > Thoughts?
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > On Mon, Feb 7, 2022 at 3:14 PM Maksim Timonin <
> > > > > > > > > > >> timoninmaxim@apache.org>
> > > > > > > > > > >> > > wrote:
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > > Hi Pavel,
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > Thanks for the links. Yes, I forgot that the
> flag
> > of
> > > > > > changed
> > > > > > > > > > >> topology
> > > > > > > > > > >> > is
> > > > > > > > > > >> > > > lazy. Also I missed that the keepAlive setting
> is
> > > > > > configured
> > > > > > > > on
> > > > > > > > > > the
> > > > > > > > > > >> > > client
> > > > > > > > > > >> > > > side (alternatively to idleTimeout that is on
> the
> > > > server
> > > > > > > > side).
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > Now I understand, this feature can be helpful
> > then.
> > > > > Every
> > > > > > > > client
> > > > > > > > > > can
> > > > > > > > > > >> > > > configure itself in case it's possible to be
> idle
> > > > > > sometimes,
> > > > > > > > and
> > > > > > > > > > >> choose
> > > > > > > > > > >> > > > an appropriate timeout by itself too. And by
> > default
> > > > the
> > > > > > > > feature
> > > > > > > > > > >> should
> > > > > > > > > > >> > > be
> > > > > > > > > > >> > > > disabled.
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > WDYT, should we add a WARN message for clients
> > that
> > > > > > > configure
> > > > > > > > > > >> > > > keepAliveTimeout greater than idleTimeout on the
> > > > server
> > > > > > > side?
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > On Mon, Feb 7, 2022 at 1:05 PM Pavel Tupitsyn <
> > > > > > > > > > ptupitsyn@apache.org
> > > > > > > > > > >> >
> > > > > > > > > > >> > > > wrote:
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > > Ivan,
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > I suggest the following:
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > 1. Server sends KEEP_ALIVE feature flag, which
> > > means
> > > > > it
> > > > > > > > > accepts
> > > > > > > > > > >> > > > > OP_KEEP_ALIVE empty message
> > > > > > > > > > >> > > > > 2. Client sends OP_KEEP_ALIVE when the
> > connection
> > > is
> > > > > > idle
> > > > > > > > for
> > > > > > > > > a
> > > > > > > > > > >> > > > > certain period of time
> > > > > > > > > > >> > > > > 3. Already implemented: when
> > > > > > > > > > >> ClientConnectorConfiguration#idleTimeout
> > > > > > > > > > >> > > is
> > > > > > > > > > >> > > > > not zero, server disconnects idle clients
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > This way we don't need server->client
> > keepalives,
> > > as
> > > > > you
> > > > > > > > > > correctly
> > > > > > > > > > >> > > noted.
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > On Mon, Feb 7, 2022 at 12:43 PM Ivan
> Daschinsky
> > <
> > > > > > > > > > >> ivandasch@gmail.com
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > > > wrote:
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > > Pavel, I suppose that ideally:
> > > > > > > > > > >> > > > > > 1. Client send in handshake flag, that it
> > > supports
> > > > > > > > > KEEP_ALIVE
> > > > > > > > > > >> > feature
> > > > > > > > > > >> > > > and
> > > > > > > > > > >> > > > > > server takes it into account.
> > > > > > > > > > >> > > > > > 2. Each request of client can be considered
> as
> > > > > > > keep-alive
> > > > > > > > > > ping.
> > > > > > > > > > >> > > > > > 3. Client send failure should be processed
> > using
> > > > > retry
> > > > > > > > > policy.
> > > > > > > > > > >> > > > > > 4. Server should not send keep-alive
> packets,
> > it
> > > > is
> > > > > > > > > redundant,
> > > > > > > > > > >> but
> > > > > > > > > > >> > > > server
> > > > > > > > > > >> > > > > > should track requests from client and if
> there
> > > is
> > > > no
> > > > > > > > > requests
> > > > > > > > > > >> from
> > > > > > > > > > >> > > > client
> > > > > > > > > > >> > > > > > with KEEP_ALIVE feature,
> > > > > > > > > > >> > > > > > automatically close connection and free
> > > resources.
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > > > > Similar approach is used in zookeeper
> clients.
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > > > > пн, 7 февр. 2022 г. в 12:24, Pavel Tupitsyn
> <
> > > > > > > > > > >> ptupitsyn@apache.org
> > > > > > > > > > >> > >:
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > > > > > Ivan,
> > > > > > > > > > >> > > > > > >
> > > > > > > > > > >> > > > > > > Ideally, the check should come from both
> > > sides.
> > > > > > > > > > >> > > > > > > - Client periodically sends keepalive to
> > > server
> > > > > > > > > > >> > > > > > > - Server periodically sends keepalive to
> > > client
> > > > > > > > > > >> > > > > > >
> > > > > > > > > > >> > > > > > > Feature flags will be added accordingly,
> so
> > it
> > > > is
> > > > > > not
> > > > > > > > > > >> necessary
> > > > > > > > > > >> > to
> > > > > > > > > > >> > > > > > > implement this in all thin clients.
> > > > > > > > > > >> > > > > > >
> > > > > > > > > > >> > > > > > > On Mon, Feb 7, 2022 at 11:43 AM Ivan
> > > Daschinsky
> > > > <
> > > > > > > > > > >> > > ivandasch@gmail.com
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > > > > > wrote:
> > > > > > > > > > >> > > > > > >
> > > > > > > > > > >> > > > > > > > I suppose it is great idea, but this
> > > > > functionality
> > > > > > > can
> > > > > > > > > be
> > > > > > > > > > >> hard
> > > > > > > > > > >> > to
> > > > > > > > > > >> > > > > > > implement
> > > > > > > > > > >> > > > > > > > for some platforms. I.e. sync python
> > client
> > > or
> > > > > php
> > > > > > > > > (there
> > > > > > > > > > >> is no
> > > > > > > > > > >> > > > real
> > > > > > > > > > >> > > > > > > > multithreading for python (GIL) and php
> is
> > > > > single
> > > > > > > > > threaded
> > > > > > > > > > >> by
> > > > > > > > > > >> > > > > design).
> > > > > > > > > > >> > > > > > > But
> > > > > > > > > > >> > > > > > > > for async clients it is not very hard to
> > > > > > implement.
> > > > > > > > > > >> > Nevertheless,
> > > > > > > > > > >> > > > > this
> > > > > > > > > > >> > > > > > > > feature should be optional, because of
> > > > possible
> > > > > > > > > technical
> > > > > > > > > > >> > > > > limitations.
> > > > > > > > > > >> > > > > > > >
> > > > > > > > > > >> > > > > > > > Pavel, is this check mostly for client
> > side?
> > > > Or
> > > > > > > > servers
> > > > > > > > > > can
> > > > > > > > > > >> do
> > > > > > > > > > >> > > some
> > > > > > > > > > >> > > > > > > actions
> > > > > > > > > > >> > > > > > > > if there is no activity from thin client
> > > (i.e.
> > > > > > > closing
> > > > > > > > > > >> context
> > > > > > > > > > >> > > and
> > > > > > > > > > >> > > > > free
> > > > > > > > > > >> > > > > > > > resources such as queries' handles and
> so
> > > on?)
> > > > > > > > > > >> > > > > > > >
> > > > > > > > > > >> > > > > > > > пн, 7 февр. 2022 г. в 11:09, Pavel
> > Tupitsyn
> > > <
> > > > > > > > > > >> > > ptupitsyn@apache.org
> > > > > > > > > > >> > > > >:
> > > > > > > > > > >> > > > > > > >
> > > > > > > > > > >> > > > > > > > > Hi Maksim,
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > > > half-state is a possible situation
> > when
> > > an
> > > > > > > Ignite
> > > > > > > > > node
> > > > > > > > > > >> goes
> > > > > > > > > > >> > > > down
> > > > > > > > > > >> > > > > or
> > > > > > > > > > >> > > > > > > > > somehow removes connection to a thin
> > > client
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > > Half-open state is also possible when,
> > for
> > > > > > > example,
> > > > > > > > an
> > > > > > > > > > >> > > > intermediate
> > > > > > > > > > >> > > > > > > > router
> > > > > > > > > > >> > > > > > > > > is rebooted [1].
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > > This is what we seem to have
> encountered
> > > > with
> > > > > > one
> > > > > > > of
> > > > > > > > > our
> > > > > > > > > > >> > > > customers
> > > > > > > > > > >> > > > > -
> > > > > > > > > > >> > > > > > > they
> > > > > > > > > > >> > > > > > > > > have a stable cluster, and long-living
> > > > > (multiple
> > > > > > > > days)
> > > > > > > > > > >> thin
> > > > > > > > > > >> > > > client
> > > > > > > > > > >> > > > > > > > > connections which can be idle for some
> > > time.
> > > > > > > > > > >> > > > > > > > > And only when we send some data on
> such
> > an
> > > > > idle
> > > > > > > > > > >> connection do
> > > > > > > > > > >> > > we
> > > > > > > > > > >> > > > > > > discover
> > > > > > > > > > >> > > > > > > > > that it is broken.
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > > > But with enabled (true by default)
> > > > > > > > > partitionAwareness
> > > > > > > > > > >> > feature
> > > > > > > > > > >> > > > > > clients
> > > > > > > > > > >> > > > > > > > can
> > > > > > > > > > >> > > > > > > > > be notified about topology changes
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > > Partition awareness is a "lazy"
> > > notification
> > > > > in
> > > > > > a
> > > > > > > > form
> > > > > > > > > > of
> > > > > > > > > > >> a
> > > > > > > > > > >> > > > > response
> > > > > > > > > > >> > > > > > > > > message flag [2].
> > > > > > > > > > >> > > > > > > > > You won't get one on an idle
> connection.
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > > > the connections are removed on the
> > > server
> > > > > side
> > > > > > > by
> > > > > > > > > > client
> > > > > > > > > > >> > idle
> > > > > > > > > > >> > > > > > timeout
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > > Idle timeout is disabled by default.
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > > > is it OK to keep such connections
> > alive
> > > > for
> > > > > a
> > > > > > > long
> > > > > > > > > > time
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > > I think it is up to the user.
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > > > in the case of partition awareness
> > > > features
> > > > > it
> > > > > > > can
> > > > > > > > > > lead
> > > > > > > > > > >> to
> > > > > > > > > > >> > > > > wasting
> > > > > > > > > > >> > > > > > > TCP
> > > > > > > > > > >> > > > > > > > > sockets on Ignite nodes, can't it
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > > Can you please elaborate?
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > > [1]
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > >
> > > > > > > > > > >> > > > > > >
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > >
> > > > > > > > > > >> >
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
> > > > > > > > > > >> > > > > > > > > [2]
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > >
> > > > > > > > > > >> > > > > > >
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > >
> > > > > > > > > > >> >
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > > On Fri, Feb 4, 2022 at 4:01 PM Maksim
> > > > Timonin
> > > > > <
> > > > > > > > > > >> > > > > > timoninmaxim@apache.org
> > > > > > > > > > >> > > > > > > >
> > > > > > > > > > >> > > > > > > > > wrote:
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > > > > Hi Pavel,
> > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > >> > > > > > > > > > Thanks for starting this thread!
> Can I
> > > ask
> > > > > > some
> > > > > > > > > > >> questions
> > > > > > > > > > >> > > here
> > > > > > > > > > >> > > > to
> > > > > > > > > > >> > > > > > get
> > > > > > > > > > >> > > > > > > > the
> > > > > > > > > > >> > > > > > > > > > feature more clearly?
> > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > >> > > > > > > > > > As I understand it correctly,
> > half-state
> > > > is
> > > > > a
> > > > > > > > > possible
> > > > > > > > > > >> > > > situation
> > > > > > > > > > >> > > > > > when
> > > > > > > > > > >> > > > > > > > an
> > > > > > > > > > >> > > > > > > > > > Ignite node goes down or somehow
> > removes
> > > > > > > > connection
> > > > > > > > > > to a
> > > > > > > > > > >> > thin
> > > > > > > > > > >> > > > > > client.
> > > > > > > > > > >> > > > > > > > But
> > > > > > > > > > >> > > > > > > > > > with enabled (true by default)
> > > > > > > partitionAwareness
> > > > > > > > > > >> feature
> > > > > > > > > > >> > > > clients
> > > > > > > > > > >> > > > > > can
> > > > > > > > > > >> > > > > > > > be
> > > > > > > > > > >> > > > > > > > > > notified about topology changes. So,
> > > there
> > > > > are
> > > > > > > > > > possible
> > > > > > > > > > >> > > cases:
> > > > > > > > > > >> > > > > > > > > > 1. ThinClient connects to a single
> > node.
> > > > > > > > > > >> > > > > > > > > > 2. Ignite node removes connection
> from
> > > > > itself.
> > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > >> > > > > > > > > > I like the idea for the case with a
> > > single
> > > > > > node,
> > > > > > > > as
> > > > > > > > > it
> > > > > > > > > > >> > helps
> > > > > > > > > > >> > > > fail
> > > > > > > > > > >> > > > > > > fast.
> > > > > > > > > > >> > > > > > > > > > But is it OK to connect a client to
> a
> > > > single
> > > > > > > node
> > > > > > > > > > only?
> > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > >> > > > > > > > > > For the second one: you mention
> that a
> > > > case
> > > > > > for
> > > > > > > > the
> > > > > > > > > > >> second
> > > > > > > > > > >> > > > option
> > > > > > > > > > >> > > > > > is
> > > > > > > > > > >> > > > > > > > > > "Long-living and mostly idle
> > connections
> > > > are
> > > > > > > > > > especially
> > > > > > > > > > >> > > > > susceptible
> > > > > > > > > > >> > > > > > > to
> > > > > > > > > > >> > > > > > > > > this
> > > > > > > > > > >> > > > > > > > > > behavior". If I understand correctly
> > the
> > > > > > > > connections
> > > > > > > > > > are
> > > > > > > > > > >> > > > removed
> > > > > > > > > > >> > > > > on
> > > > > > > > > > >> > > > > > > the
> > > > > > > > > > >> > > > > > > > > > server side by client idle timeout.
> > Can
> > > we
> > > > > > just
> > > > > > > > > > >> configure
> > > > > > > > > > >> > the
> > > > > > > > > > >> > > > > idle
> > > > > > > > > > >> > > > > > > > > timeout
> > > > > > > > > > >> > > > > > > > > > for cases where we really need
> keeping
> > > > alive
> > > > > > > idle
> > > > > > > > > > >> > > connections?
> > > > > > > > > > >> > > > > Are
> > > > > > > > > > >> > > > > > > > there
> > > > > > > > > > >> > > > > > > > > > any other cases with unexpectedly
> > > dropped
> > > > > > > > > connections?
> > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > >> > > > > > > > > > I'm wondering is it OK to keep such
> > > > > > connections
> > > > > > > > > alive
> > > > > > > > > > >> for a
> > > > > > > > > > >> > > > long
> > > > > > > > > > >> > > > > > > time?
> > > > > > > > > > >> > > > > > > > > > Also in the case of partition
> > awareness
> > > > > > features
> > > > > > > > it
> > > > > > > > > > can
> > > > > > > > > > >> > lead
> > > > > > > > > > >> > > to
> > > > > > > > > > >> > > > > > > wasting
> > > > > > > > > > >> > > > > > > > > TCP
> > > > > > > > > > >> > > > > > > > > > sockets on Ignite nodes, can't it?
> > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > >> > > > > > > > > > Thanks!
> > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > >> > > > > > > > > > On Thu, Feb 3, 2022 at 2:24 PM Pavel
> > > > > Tupitsyn
> > > > > > <
> > > > > > > > > > >> > > > > > ptupitsyn@apache.org>
> > > > > > > > > > >> > > > > > > > > > wrote:
> > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > >> > > > > > > > > >> Igniters,
> > > > > > > > > > >> > > > > > > > > >>
> > > > > > > > > > >> > > > > > > > > >> Please review the proposal to add
> > > > heartbeat
> > > > > > > > > messages
> > > > > > > > > > to
> > > > > > > > > > >> > the
> > > > > > > > > > >> > > > thin
> > > > > > > > > > >> > > > > > > > client
> > > > > > > > > > >> > > > > > > > > >> protocol (both 2.x and 3.x) and let
> > me
> > > > know
> > > > > > > your
> > > > > > > > > > >> thoughts:
> > > > > > > > > > >> > > > > > > > > >>
> > > > > > > > > > >> > > > > > > > > >>
> > > > > > > > > > >> > > > > > > > > >>
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > >
> > > > > > > > > > >> > > > > > >
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > >
> > > > > > > > > > >> >
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
> > > > > > > > > > >> > > > > > > > > >>
> > > > > > > > > > >> > > > > > > > > >
> > > > > > > > > > >> > > > > > > > >
> > > > > > > > > > >> > > > > > > >
> > > > > > > > > > >> > > > > > > >
> > > > > > > > > > >> > > > > > > > --
> > > > > > > > > > >> > > > > > > > Sincerely yours, Ivan Daschinskiy
> > > > > > > > > > >> > > > > > > >
> > > > > > > > > > >> > > > > > >
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > > > > --
> > > > > > > > > > >> > > > > > Sincerely yours, Ivan Daschinskiy
> > > > > > > > > > >> > > > > >
> > > > > > > > > > >> > > > >
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > >
> > > > > > > > > > >> >
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >> --
> > > > > > > > > > >> Sincerely yours, Ivan Daschinskiy
> > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Sincerely yours, Ivan Daschinskiy
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sincerely yours, Ivan Daschinskiy
> > > > >
> > > >
> > >
> > >
> > > --
> > > Sincerely yours, Ivan Daschinskiy
> > >
> >
>


-- 
Sincerely yours, Ivan Daschinskiy

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Pavel Tupitsyn <pt...@apache.org>.
Ivan, I mostly agree with your proposal, except this point:

> Response to heartbeat request -- is idle timeout
Idle timeout can't change, why send it back with every heartbeat response?

> possible cases with cluster restart, upgrade
In those cases, a new connection will be established, and we'll retrieve
the new timeout after the handshake.


On Tue, Feb 15, 2022 at 12:04 PM Maksim Timonin <ti...@apache.org>
wrote:

> Hi Ivan,
>
> Cases you described sound reasonable to me. Then the client should just set
> up the `keepAlive` flag, and it just works.
>
> So, there are 3 branches:
> 1. Users don't configure keepAlive at all.
> 2. Users configure keepAliveHeartbeatInterval (long, ms).
> 3. Users configure keepAlive (boolean).
>
> AFAIU, Pavel's proposal is about covering the second case only. But
> actually the 2nd and 3rd aren't conflicted with each other.I think for both
> branches, a cluster should respond with idleTimeout value on every keep
> alive client request. Because there are possible cases with cluster
> restart, upgrade, etc. Clients should check every response and in case of
> changed idleTimeout. For 2nd case write a WARN message, and for 3rd -
> reconfigure themself in case of changed idleTimeout.
>
>
>
>
> On Tue, Feb 15, 2022 at 9:51 AM Ivan Daschinsky <iv...@gmail.com>
> wrote:
>
> > Regarding discussion here [1]
> >
> > I suppose that this feature, despite the fact that initial intention of
> > Pavel was different, can drastically
> > improve the usage pattern of thin clients and give a lot of opportunities
> > if the following is done:
> >
> > 1. GridNioServer has a great feature -- idle timeout. If  a server did
> not
> > receive any from a client -- it will be kicked off.
> >     But there are some scenarios that make the use of this feature
> > impossible:
> > a. Multiple workers waiting for batch tasks and relatively low requests
> > rate -- this services will be often kicked off and must reconnect.
> > In order to prevent this behaviour, the user must implement a kind of
> > heartbeating by himself.
> > b. Quite often user may want to implement leader-follower pattern for
> > services for HA, so followers also will be considered as idle. Kicking
> off
> > these followers
> > is not acceptable, so user  should also implement heartbeating by
> himself.
> >
> > My proposition is:
> > 1. Add two flags -- enable/disable heartbeats, and very optional
> heartbeat
> > timeout. Set enable to true by default, timeout to default heartbeat
> > timeout.
> > 2. If server and client both support this feature, and heartbeats are not
> > explicitly disabled on client side:
> > 3. Response to heartbeat request -- is idle timeout. If idle timeout is
> set
> > on the server side , set heartbeat timeout to one-third of it, instead
> set
> > to default or specified value.
> >
> > Pros:
> > 1. Easy to set up -- just flag on client side and just set timeout on
> > server side.
> > 2. Hard to configure improperly, i.e set heartbeat timeout not short
> enough
> > in order to prevent kicking out by server.
> > 3. If the user just wants heartbeats without setting idle timeout --
> > heartbeats are by default on and with reasonable timeout.
> >
> > Cons:
> > 1. If someone will rely on old behavior and just wants to drop his
> clients
> > on timeout -- this will not work without reconfiguring, he should disable
> > heartbeats.
> > But I cannot even imagine that someone will find this behaviour
> desirable.
> > I strongly believe that this behaviour prevents users from using
> > idleTimeout on server side.
> >
> > [1] -- https://github.com/apache/ignite/pull/9817#discussion_r805628955
> >
> > пт, 11 февр. 2022 г. в 10:58, Pavel Tupitsyn <pt...@apache.org>:
> >
> > > I've prepared a PR, please have a look:
> > > https://github.com/apache/ignite/pull/9817
> > >
> > > On Mon, Feb 7, 2022 at 6:37 PM Ivan Daschinsky <iv...@gmail.com>
> > > wrote:
> > >
> > > > I see potential in this feature, especially if we use something like
> > > > continuous query. Stale clients can consume a lot of resources and it
> > is
> > > > worth kick these clients out.
> > > >
> > > > пн, 7 февр. 2022 г. в 18:25, Pavel Tupitsyn <pt...@apache.org>:
> > > >
> > > > > > If we use new approach, we can reduce this timeout. But this can
> > > affect
> > > > > old clients.
> > > > >
> > > > > idleTimeout is disabled by default, we are not going to change
> this.
> > > > >
> > > > > > Also, let's think about that sending heartbeats and interval of
> > > sending
> > > > > > heartbeats could be calculated on the server side (i.e. one third
> > of
> > > > idle
> > > > > > timeout) and sent to the client during handshake.
> > > > > > Also we can introduce something like a negotiation mechanism as
> in
> > > > > > zookeeper.
> > > > >
> > > > > I tend to agree with Maksim here, let's keep it simple and
> explicit.
> > > > > Log a warning, but don't do anything clever.
> > > > >
> > > > > On Mon, Feb 7, 2022 at 6:15 PM Ivan Daschinsky <
> ivandasch@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > >> idleTimeout already exists, I don't think we should change the
> > way
> > > > it
> > > > > > works (or did I misunderstand you?)
> > > > > > If we use new approach, we can reduce this timeout. But this can
> > > affect
> > > > > old
> > > > > > clients.
> > > > > >
> > > > > >
> > > > > > Also, let's think about that sending heartbeats and interval of
> > > sending
> > > > > > heartbeats could be calculated on the server side (i.e. one third
> > of
> > > > idle
> > > > > > timeout) and sent to the client
> > > > > > during handshake.
> > > > > > Also we can introduce something like a negotiation mechanism as
> in
> > > > > > zookeeper.
> > > > > >
> > > > > >
> > > > > > пн, 7 февр. 2022 г. в 18:05, Pavel Tupitsyn <
> ptupitsyn@apache.org
> > >:
> > > > > >
> > > > > > > Igor,
> > > > > > >
> > > > > > > > Maybe clients should pass this information on to the
> handshake.
> > > > > > >
> > > > > > > Do you think we should log a mismatched timeout warning on the
> > > > server,
> > > > > > not
> > > > > > > on the client?
> > > > > > > Or should we do both?
> > > > > > >
> > > > > > >
> > > > > > > I've updated the proposal with OP_GET_IDLE_TIMEOUT and some
> other
> > > > > details
> > > > > > > discussed above.
> > > > > > >
> > > > > > > On Mon, Feb 7, 2022 at 5:42 PM Igor Sapego <isapego@apache.org
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Feature seems useful for me as it makes connection management
> > > more
> > > > > > robust
> > > > > > > > and
> > > > > > > > predictable.
> > > > > > > >
> > > > > > > > I agree with Pavel, that we should print warning when
> heartbeat
> > > > > period
> > > > > > is
> > > > > > > > larger than
> > > > > > > > idle timeout, but I see a problem here as idle timeout is
> > > > configured
> > > > > on
> > > > > > > > server and is not
> > > > > > > > known to clients, while heartbeats configured on clients and
> > > their
> > > > > > period
> > > > > > > > is not known
> > > > > > > > to the server. Maybe clients should pass this information on
> to
> > > the
> > > > > > > > handshake.
> > > > > > > >
> > > > > > > > Regarding Python and PHP clients - can not we use some kind
> of
> > > > timers
> > > > > > to
> > > > > > > > implement
> > > > > > > > this feature?
> > > > > > > >
> > > > > > > > Best Regards,
> > > > > > > > Igor
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <
> > > > ptupitsyn@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Maksim, agree. Let's not be too clever and only log a
> > warning.
> > > > > > > > >
> > > > > > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <
> > > > > ptupitsyn@apache.org>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Ivan, idleTimeout already exists, I don't think we should
> > > > change
> > > > > > the
> > > > > > > > way
> > > > > > > > > > it works (or did I misunderstand you?)
> > > > > > > > > >
> > > > > > > > > > Of course, enabling heartbeats means that otherwise idle
> > > > clients
> > > > > > will
> > > > > > > > no
> > > > > > > > > > longer be disconnected by the server.
> > > > > > > > > > I think we should cross-link those properties in the
> > > > > documentation
> > > > > > > and
> > > > > > > > > > explain this behavior.
> > > > > > > > > >
> > > > > > > > > > On Mon, Feb 7, 2022 at 4:39 PM Ivan Daschinsky <
> > > > > > ivandasch@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> >>3. Already implemented: when
> > > > > > > > ClientConnectorConfiguration#idleTimeout
> > > > > > > > > is
> > > > > > > > > >> not zero, server disconnects idle clients
> > > > > > > > > >> >>
> > > > > > > > > >> But I suppose it would be great to have:
> > > > > > > > > >> 1. If client supports keep alive, use idleTimeout
> > > > > > > > > >> 2. If not, do not use it.
> > > > > > > > > >>
> > > > > > > > > >> But I am not sure if it is correct or not.
> > > > > > > > > >>
> > > > > > > > > >> пн, 7 февр. 2022 г. в 16:01, Maksim Timonin <
> > > > > > > timoninmaxim@apache.org
> > > > > > > > >:
> > > > > > > > > >>
> > > > > > > > > >> > I believe explicit is better than implicit :) Also in
> > case
> > > > of
> > > > > > > > dynamic
> > > > > > > > > >> > calculation of timeout, it can change dynamically, for
> > > > example
> > > > > > > > > >> restarting a
> > > > > > > > > >> > cluster with different configuration should
> reconfigure
> > > > > clients
> > > > > > > too.
> > > > > > > > > >> Looks
> > > > > > > > > >> > complicated.
> > > > > > > > > >> >
> > > > > > > > > >> > My vote for WARN + javadocs with mention of this
> issue.
> > > > > > > > > >> >
> > > > > > > > > >> > On Mon, Feb 7, 2022 at 3:51 PM Pavel Tupitsyn <
> > > > > > > ptupitsyn@apache.org
> > > > > > > > >
> > > > > > > > > >> > wrote:
> > > > > > > > > >> >
> > > > > > > > > >> > > > WDYT, should we add a WARN message for clients
> that
> > > > > > configure
> > > > > > > > > >> > > > keepAliveTimeout greater than idleTimeout on the
> > > server
> > > > > > side?
> > > > > > > > > >> > >
> > > > > > > > > >> > > I think we should either log a WARN, or retrieve
> > > > idleTimeout
> > > > > > > from
> > > > > > > > > >> server
> > > > > > > > > >> > > and configure heartbeatTimeout accordingly (e.g.
> > divide
> > > by
> > > > > 2).
> > > > > > > > > >> > > Thoughts?
> > > > > > > > > >> > >
> > > > > > > > > >> > > On Mon, Feb 7, 2022 at 3:14 PM Maksim Timonin <
> > > > > > > > > >> timoninmaxim@apache.org>
> > > > > > > > > >> > > wrote:
> > > > > > > > > >> > >
> > > > > > > > > >> > > > Hi Pavel,
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > Thanks for the links. Yes, I forgot that the flag
> of
> > > > > changed
> > > > > > > > > >> topology
> > > > > > > > > >> > is
> > > > > > > > > >> > > > lazy. Also I missed that the keepAlive setting is
> > > > > configured
> > > > > > > on
> > > > > > > > > the
> > > > > > > > > >> > > client
> > > > > > > > > >> > > > side (alternatively to idleTimeout that is on the
> > > server
> > > > > > > side).
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > Now I understand, this feature can be helpful
> then.
> > > > Every
> > > > > > > client
> > > > > > > > > can
> > > > > > > > > >> > > > configure itself in case it's possible to be idle
> > > > > sometimes,
> > > > > > > and
> > > > > > > > > >> choose
> > > > > > > > > >> > > > an appropriate timeout by itself too. And by
> default
> > > the
> > > > > > > feature
> > > > > > > > > >> should
> > > > > > > > > >> > > be
> > > > > > > > > >> > > > disabled.
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > WDYT, should we add a WARN message for clients
> that
> > > > > > configure
> > > > > > > > > >> > > > keepAliveTimeout greater than idleTimeout on the
> > > server
> > > > > > side?
> > > > > > > > > >> > > >
> > > > > > > > > >> > > >
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > On Mon, Feb 7, 2022 at 1:05 PM Pavel Tupitsyn <
> > > > > > > > > ptupitsyn@apache.org
> > > > > > > > > >> >
> > > > > > > > > >> > > > wrote:
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > > Ivan,
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > I suggest the following:
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > 1. Server sends KEEP_ALIVE feature flag, which
> > means
> > > > it
> > > > > > > > accepts
> > > > > > > > > >> > > > > OP_KEEP_ALIVE empty message
> > > > > > > > > >> > > > > 2. Client sends OP_KEEP_ALIVE when the
> connection
> > is
> > > > > idle
> > > > > > > for
> > > > > > > > a
> > > > > > > > > >> > > > > certain period of time
> > > > > > > > > >> > > > > 3. Already implemented: when
> > > > > > > > > >> ClientConnectorConfiguration#idleTimeout
> > > > > > > > > >> > > is
> > > > > > > > > >> > > > > not zero, server disconnects idle clients
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > This way we don't need server->client
> keepalives,
> > as
> > > > you
> > > > > > > > > correctly
> > > > > > > > > >> > > noted.
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > On Mon, Feb 7, 2022 at 12:43 PM Ivan Daschinsky
> <
> > > > > > > > > >> ivandasch@gmail.com
> > > > > > > > > >> > >
> > > > > > > > > >> > > > > wrote:
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > > Pavel, I suppose that ideally:
> > > > > > > > > >> > > > > > 1. Client send in handshake flag, that it
> > supports
> > > > > > > > KEEP_ALIVE
> > > > > > > > > >> > feature
> > > > > > > > > >> > > > and
> > > > > > > > > >> > > > > > server takes it into account.
> > > > > > > > > >> > > > > > 2. Each request of client can be considered as
> > > > > > keep-alive
> > > > > > > > > ping.
> > > > > > > > > >> > > > > > 3. Client send failure should be processed
> using
> > > > retry
> > > > > > > > policy.
> > > > > > > > > >> > > > > > 4. Server should not send keep-alive packets,
> it
> > > is
> > > > > > > > redundant,
> > > > > > > > > >> but
> > > > > > > > > >> > > > server
> > > > > > > > > >> > > > > > should track requests from client and if there
> > is
> > > no
> > > > > > > > requests
> > > > > > > > > >> from
> > > > > > > > > >> > > > client
> > > > > > > > > >> > > > > > with KEEP_ALIVE feature,
> > > > > > > > > >> > > > > > automatically close connection and free
> > resources.
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > Similar approach is used in zookeeper clients.
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > пн, 7 февр. 2022 г. в 12:24, Pavel Tupitsyn <
> > > > > > > > > >> ptupitsyn@apache.org
> > > > > > > > > >> > >:
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > > Ivan,
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > > > Ideally, the check should come from both
> > sides.
> > > > > > > > > >> > > > > > > - Client periodically sends keepalive to
> > server
> > > > > > > > > >> > > > > > > - Server periodically sends keepalive to
> > client
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > > > Feature flags will be added accordingly, so
> it
> > > is
> > > > > not
> > > > > > > > > >> necessary
> > > > > > > > > >> > to
> > > > > > > > > >> > > > > > > implement this in all thin clients.
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > > > On Mon, Feb 7, 2022 at 11:43 AM Ivan
> > Daschinsky
> > > <
> > > > > > > > > >> > > ivandasch@gmail.com
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > > > > > wrote:
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > > > > I suppose it is great idea, but this
> > > > functionality
> > > > > > can
> > > > > > > > be
> > > > > > > > > >> hard
> > > > > > > > > >> > to
> > > > > > > > > >> > > > > > > implement
> > > > > > > > > >> > > > > > > > for some platforms. I.e. sync python
> client
> > or
> > > > php
> > > > > > > > (there
> > > > > > > > > >> is no
> > > > > > > > > >> > > > real
> > > > > > > > > >> > > > > > > > multithreading for python (GIL) and php is
> > > > single
> > > > > > > > threaded
> > > > > > > > > >> by
> > > > > > > > > >> > > > > design).
> > > > > > > > > >> > > > > > > But
> > > > > > > > > >> > > > > > > > for async clients it is not very hard to
> > > > > implement.
> > > > > > > > > >> > Nevertheless,
> > > > > > > > > >> > > > > this
> > > > > > > > > >> > > > > > > > feature should be optional, because of
> > > possible
> > > > > > > > technical
> > > > > > > > > >> > > > > limitations.
> > > > > > > > > >> > > > > > > >
> > > > > > > > > >> > > > > > > > Pavel, is this check mostly for client
> side?
> > > Or
> > > > > > > servers
> > > > > > > > > can
> > > > > > > > > >> do
> > > > > > > > > >> > > some
> > > > > > > > > >> > > > > > > actions
> > > > > > > > > >> > > > > > > > if there is no activity from thin client
> > (i.e.
> > > > > > closing
> > > > > > > > > >> context
> > > > > > > > > >> > > and
> > > > > > > > > >> > > > > free
> > > > > > > > > >> > > > > > > > resources such as queries' handles and so
> > on?)
> > > > > > > > > >> > > > > > > >
> > > > > > > > > >> > > > > > > > пн, 7 февр. 2022 г. в 11:09, Pavel
> Tupitsyn
> > <
> > > > > > > > > >> > > ptupitsyn@apache.org
> > > > > > > > > >> > > > >:
> > > > > > > > > >> > > > > > > >
> > > > > > > > > >> > > > > > > > > Hi Maksim,
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > > > half-state is a possible situation
> when
> > an
> > > > > > Ignite
> > > > > > > > node
> > > > > > > > > >> goes
> > > > > > > > > >> > > > down
> > > > > > > > > >> > > > > or
> > > > > > > > > >> > > > > > > > > somehow removes connection to a thin
> > client
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > > Half-open state is also possible when,
> for
> > > > > > example,
> > > > > > > an
> > > > > > > > > >> > > > intermediate
> > > > > > > > > >> > > > > > > > router
> > > > > > > > > >> > > > > > > > > is rebooted [1].
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > > This is what we seem to have encountered
> > > with
> > > > > one
> > > > > > of
> > > > > > > > our
> > > > > > > > > >> > > > customers
> > > > > > > > > >> > > > > -
> > > > > > > > > >> > > > > > > they
> > > > > > > > > >> > > > > > > > > have a stable cluster, and long-living
> > > > (multiple
> > > > > > > days)
> > > > > > > > > >> thin
> > > > > > > > > >> > > > client
> > > > > > > > > >> > > > > > > > > connections which can be idle for some
> > time.
> > > > > > > > > >> > > > > > > > > And only when we send some data on such
> an
> > > > idle
> > > > > > > > > >> connection do
> > > > > > > > > >> > > we
> > > > > > > > > >> > > > > > > discover
> > > > > > > > > >> > > > > > > > > that it is broken.
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > > > But with enabled (true by default)
> > > > > > > > partitionAwareness
> > > > > > > > > >> > feature
> > > > > > > > > >> > > > > > clients
> > > > > > > > > >> > > > > > > > can
> > > > > > > > > >> > > > > > > > > be notified about topology changes
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > > Partition awareness is a "lazy"
> > notification
> > > > in
> > > > > a
> > > > > > > form
> > > > > > > > > of
> > > > > > > > > >> a
> > > > > > > > > >> > > > > response
> > > > > > > > > >> > > > > > > > > message flag [2].
> > > > > > > > > >> > > > > > > > > You won't get one on an idle connection.
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > > > the connections are removed on the
> > server
> > > > side
> > > > > > by
> > > > > > > > > client
> > > > > > > > > >> > idle
> > > > > > > > > >> > > > > > timeout
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > > Idle timeout is disabled by default.
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > > > is it OK to keep such connections
> alive
> > > for
> > > > a
> > > > > > long
> > > > > > > > > time
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > > I think it is up to the user.
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > > > in the case of partition awareness
> > > features
> > > > it
> > > > > > can
> > > > > > > > > lead
> > > > > > > > > >> to
> > > > > > > > > >> > > > > wasting
> > > > > > > > > >> > > > > > > TCP
> > > > > > > > > >> > > > > > > > > sockets on Ignite nodes, can't it
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > > Can you please elaborate?
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > > [1]
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > >
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > >
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
> > > > > > > > > >> > > > > > > > > [2]
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > >
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > >
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > > On Fri, Feb 4, 2022 at 4:01 PM Maksim
> > > Timonin
> > > > <
> > > > > > > > > >> > > > > > timoninmaxim@apache.org
> > > > > > > > > >> > > > > > > >
> > > > > > > > > >> > > > > > > > > wrote:
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > > > > Hi Pavel,
> > > > > > > > > >> > > > > > > > > >
> > > > > > > > > >> > > > > > > > > > Thanks for starting this thread! Can I
> > ask
> > > > > some
> > > > > > > > > >> questions
> > > > > > > > > >> > > here
> > > > > > > > > >> > > > to
> > > > > > > > > >> > > > > > get
> > > > > > > > > >> > > > > > > > the
> > > > > > > > > >> > > > > > > > > > feature more clearly?
> > > > > > > > > >> > > > > > > > > >
> > > > > > > > > >> > > > > > > > > > As I understand it correctly,
> half-state
> > > is
> > > > a
> > > > > > > > possible
> > > > > > > > > >> > > > situation
> > > > > > > > > >> > > > > > when
> > > > > > > > > >> > > > > > > > an
> > > > > > > > > >> > > > > > > > > > Ignite node goes down or somehow
> removes
> > > > > > > connection
> > > > > > > > > to a
> > > > > > > > > >> > thin
> > > > > > > > > >> > > > > > client.
> > > > > > > > > >> > > > > > > > But
> > > > > > > > > >> > > > > > > > > > with enabled (true by default)
> > > > > > partitionAwareness
> > > > > > > > > >> feature
> > > > > > > > > >> > > > clients
> > > > > > > > > >> > > > > > can
> > > > > > > > > >> > > > > > > > be
> > > > > > > > > >> > > > > > > > > > notified about topology changes. So,
> > there
> > > > are
> > > > > > > > > possible
> > > > > > > > > >> > > cases:
> > > > > > > > > >> > > > > > > > > > 1. ThinClient connects to a single
> node.
> > > > > > > > > >> > > > > > > > > > 2. Ignite node removes connection from
> > > > itself.
> > > > > > > > > >> > > > > > > > > >
> > > > > > > > > >> > > > > > > > > > I like the idea for the case with a
> > single
> > > > > node,
> > > > > > > as
> > > > > > > > it
> > > > > > > > > >> > helps
> > > > > > > > > >> > > > fail
> > > > > > > > > >> > > > > > > fast.
> > > > > > > > > >> > > > > > > > > > But is it OK to connect a client to a
> > > single
> > > > > > node
> > > > > > > > > only?
> > > > > > > > > >> > > > > > > > > >
> > > > > > > > > >> > > > > > > > > > For the second one: you mention that a
> > > case
> > > > > for
> > > > > > > the
> > > > > > > > > >> second
> > > > > > > > > >> > > > option
> > > > > > > > > >> > > > > > is
> > > > > > > > > >> > > > > > > > > > "Long-living and mostly idle
> connections
> > > are
> > > > > > > > > especially
> > > > > > > > > >> > > > > susceptible
> > > > > > > > > >> > > > > > > to
> > > > > > > > > >> > > > > > > > > this
> > > > > > > > > >> > > > > > > > > > behavior". If I understand correctly
> the
> > > > > > > connections
> > > > > > > > > are
> > > > > > > > > >> > > > removed
> > > > > > > > > >> > > > > on
> > > > > > > > > >> > > > > > > the
> > > > > > > > > >> > > > > > > > > > server side by client idle timeout.
> Can
> > we
> > > > > just
> > > > > > > > > >> configure
> > > > > > > > > >> > the
> > > > > > > > > >> > > > > idle
> > > > > > > > > >> > > > > > > > > timeout
> > > > > > > > > >> > > > > > > > > > for cases where we really need keeping
> > > alive
> > > > > > idle
> > > > > > > > > >> > > connections?
> > > > > > > > > >> > > > > Are
> > > > > > > > > >> > > > > > > > there
> > > > > > > > > >> > > > > > > > > > any other cases with unexpectedly
> > dropped
> > > > > > > > connections?
> > > > > > > > > >> > > > > > > > > >
> > > > > > > > > >> > > > > > > > > > I'm wondering is it OK to keep such
> > > > > connections
> > > > > > > > alive
> > > > > > > > > >> for a
> > > > > > > > > >> > > > long
> > > > > > > > > >> > > > > > > time?
> > > > > > > > > >> > > > > > > > > > Also in the case of partition
> awareness
> > > > > features
> > > > > > > it
> > > > > > > > > can
> > > > > > > > > >> > lead
> > > > > > > > > >> > > to
> > > > > > > > > >> > > > > > > wasting
> > > > > > > > > >> > > > > > > > > TCP
> > > > > > > > > >> > > > > > > > > > sockets on Ignite nodes, can't it?
> > > > > > > > > >> > > > > > > > > >
> > > > > > > > > >> > > > > > > > > > Thanks!
> > > > > > > > > >> > > > > > > > > >
> > > > > > > > > >> > > > > > > > > >
> > > > > > > > > >> > > > > > > > > >
> > > > > > > > > >> > > > > > > > > >
> > > > > > > > > >> > > > > > > > > >
> > > > > > > > > >> > > > > > > > > >
> > > > > > > > > >> > > > > > > > > > On Thu, Feb 3, 2022 at 2:24 PM Pavel
> > > > Tupitsyn
> > > > > <
> > > > > > > > > >> > > > > > ptupitsyn@apache.org>
> > > > > > > > > >> > > > > > > > > > wrote:
> > > > > > > > > >> > > > > > > > > >
> > > > > > > > > >> > > > > > > > > >> Igniters,
> > > > > > > > > >> > > > > > > > > >>
> > > > > > > > > >> > > > > > > > > >> Please review the proposal to add
> > > heartbeat
> > > > > > > > messages
> > > > > > > > > to
> > > > > > > > > >> > the
> > > > > > > > > >> > > > thin
> > > > > > > > > >> > > > > > > > client
> > > > > > > > > >> > > > > > > > > >> protocol (both 2.x and 3.x) and let
> me
> > > know
> > > > > > your
> > > > > > > > > >> thoughts:
> > > > > > > > > >> > > > > > > > > >>
> > > > > > > > > >> > > > > > > > > >>
> > > > > > > > > >> > > > > > > > > >>
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > >
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > >
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
> > > > > > > > > >> > > > > > > > > >>
> > > > > > > > > >> > > > > > > > > >
> > > > > > > > > >> > > > > > > > >
> > > > > > > > > >> > > > > > > >
> > > > > > > > > >> > > > > > > >
> > > > > > > > > >> > > > > > > > --
> > > > > > > > > >> > > > > > > > Sincerely yours, Ivan Daschinskiy
> > > > > > > > > >> > > > > > > >
> > > > > > > > > >> > > > > > >
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > > > --
> > > > > > > > > >> > > > > > Sincerely yours, Ivan Daschinskiy
> > > > > > > > > >> > > > > >
> > > > > > > > > >> > > > >
> > > > > > > > > >> > > >
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> --
> > > > > > > > > >> Sincerely yours, Ivan Daschinskiy
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Sincerely yours, Ivan Daschinskiy
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Sincerely yours, Ivan Daschinskiy
> > > >
> > >
> >
> >
> > --
> > Sincerely yours, Ivan Daschinskiy
> >
>

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Maksim Timonin <ti...@apache.org>.
Hi Ivan,

Cases you described sound reasonable to me. Then the client should just set
up the `keepAlive` flag, and it just works.

So, there are 3 branches:
1. Users don't configure keepAlive at all.
2. Users configure keepAliveHeartbeatInterval (long, ms).
3. Users configure keepAlive (boolean).

AFAIU, Pavel's proposal is about covering the second case only. But
actually the 2nd and 3rd aren't conflicted with each other.I think for both
branches, a cluster should respond with idleTimeout value on every keep
alive client request. Because there are possible cases with cluster
restart, upgrade, etc. Clients should check every response and in case of
changed idleTimeout. For 2nd case write a WARN message, and for 3rd -
reconfigure themself in case of changed idleTimeout.




On Tue, Feb 15, 2022 at 9:51 AM Ivan Daschinsky <iv...@gmail.com> wrote:

> Regarding discussion here [1]
>
> I suppose that this feature, despite the fact that initial intention of
> Pavel was different, can drastically
> improve the usage pattern of thin clients and give a lot of opportunities
> if the following is done:
>
> 1. GridNioServer has a great feature -- idle timeout. If  a server did not
> receive any from a client -- it will be kicked off.
>     But there are some scenarios that make the use of this feature
> impossible:
> a. Multiple workers waiting for batch tasks and relatively low requests
> rate -- this services will be often kicked off and must reconnect.
> In order to prevent this behaviour, the user must implement a kind of
> heartbeating by himself.
> b. Quite often user may want to implement leader-follower pattern for
> services for HA, so followers also will be considered as idle. Kicking off
> these followers
> is not acceptable, so user  should also implement heartbeating by himself.
>
> My proposition is:
> 1. Add two flags -- enable/disable heartbeats, and very optional heartbeat
> timeout. Set enable to true by default, timeout to default heartbeat
> timeout.
> 2. If server and client both support this feature, and heartbeats are not
> explicitly disabled on client side:
> 3. Response to heartbeat request -- is idle timeout. If idle timeout is set
> on the server side , set heartbeat timeout to one-third of it, instead set
> to default or specified value.
>
> Pros:
> 1. Easy to set up -- just flag on client side and just set timeout on
> server side.
> 2. Hard to configure improperly, i.e set heartbeat timeout not short enough
> in order to prevent kicking out by server.
> 3. If the user just wants heartbeats without setting idle timeout --
> heartbeats are by default on and with reasonable timeout.
>
> Cons:
> 1. If someone will rely on old behavior and just wants to drop his clients
> on timeout -- this will not work without reconfiguring, he should disable
> heartbeats.
> But I cannot even imagine that someone will find this behaviour desirable.
> I strongly believe that this behaviour prevents users from using
> idleTimeout on server side.
>
> [1] -- https://github.com/apache/ignite/pull/9817#discussion_r805628955
>
> пт, 11 февр. 2022 г. в 10:58, Pavel Tupitsyn <pt...@apache.org>:
>
> > I've prepared a PR, please have a look:
> > https://github.com/apache/ignite/pull/9817
> >
> > On Mon, Feb 7, 2022 at 6:37 PM Ivan Daschinsky <iv...@gmail.com>
> > wrote:
> >
> > > I see potential in this feature, especially if we use something like
> > > continuous query. Stale clients can consume a lot of resources and it
> is
> > > worth kick these clients out.
> > >
> > > пн, 7 февр. 2022 г. в 18:25, Pavel Tupitsyn <pt...@apache.org>:
> > >
> > > > > If we use new approach, we can reduce this timeout. But this can
> > affect
> > > > old clients.
> > > >
> > > > idleTimeout is disabled by default, we are not going to change this.
> > > >
> > > > > Also, let's think about that sending heartbeats and interval of
> > sending
> > > > > heartbeats could be calculated on the server side (i.e. one third
> of
> > > idle
> > > > > timeout) and sent to the client during handshake.
> > > > > Also we can introduce something like a negotiation mechanism as in
> > > > > zookeeper.
> > > >
> > > > I tend to agree with Maksim here, let's keep it simple and explicit.
> > > > Log a warning, but don't do anything clever.
> > > >
> > > > On Mon, Feb 7, 2022 at 6:15 PM Ivan Daschinsky <iv...@gmail.com>
> > > > wrote:
> > > >
> > > > > >> idleTimeout already exists, I don't think we should change the
> way
> > > it
> > > > > works (or did I misunderstand you?)
> > > > > If we use new approach, we can reduce this timeout. But this can
> > affect
> > > > old
> > > > > clients.
> > > > >
> > > > >
> > > > > Also, let's think about that sending heartbeats and interval of
> > sending
> > > > > heartbeats could be calculated on the server side (i.e. one third
> of
> > > idle
> > > > > timeout) and sent to the client
> > > > > during handshake.
> > > > > Also we can introduce something like a negotiation mechanism as in
> > > > > zookeeper.
> > > > >
> > > > >
> > > > > пн, 7 февр. 2022 г. в 18:05, Pavel Tupitsyn <ptupitsyn@apache.org
> >:
> > > > >
> > > > > > Igor,
> > > > > >
> > > > > > > Maybe clients should pass this information on to the handshake.
> > > > > >
> > > > > > Do you think we should log a mismatched timeout warning on the
> > > server,
> > > > > not
> > > > > > on the client?
> > > > > > Or should we do both?
> > > > > >
> > > > > >
> > > > > > I've updated the proposal with OP_GET_IDLE_TIMEOUT and some other
> > > > details
> > > > > > discussed above.
> > > > > >
> > > > > > On Mon, Feb 7, 2022 at 5:42 PM Igor Sapego <is...@apache.org>
> > > wrote:
> > > > > >
> > > > > > > Feature seems useful for me as it makes connection management
> > more
> > > > > robust
> > > > > > > and
> > > > > > > predictable.
> > > > > > >
> > > > > > > I agree with Pavel, that we should print warning when heartbeat
> > > > period
> > > > > is
> > > > > > > larger than
> > > > > > > idle timeout, but I see a problem here as idle timeout is
> > > configured
> > > > on
> > > > > > > server and is not
> > > > > > > known to clients, while heartbeats configured on clients and
> > their
> > > > > period
> > > > > > > is not known
> > > > > > > to the server. Maybe clients should pass this information on to
> > the
> > > > > > > handshake.
> > > > > > >
> > > > > > > Regarding Python and PHP clients - can not we use some kind of
> > > timers
> > > > > to
> > > > > > > implement
> > > > > > > this feature?
> > > > > > >
> > > > > > > Best Regards,
> > > > > > > Igor
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <
> > > ptupitsyn@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Maksim, agree. Let's not be too clever and only log a
> warning.
> > > > > > > >
> > > > > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <
> > > > ptupitsyn@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Ivan, idleTimeout already exists, I don't think we should
> > > change
> > > > > the
> > > > > > > way
> > > > > > > > > it works (or did I misunderstand you?)
> > > > > > > > >
> > > > > > > > > Of course, enabling heartbeats means that otherwise idle
> > > clients
> > > > > will
> > > > > > > no
> > > > > > > > > longer be disconnected by the server.
> > > > > > > > > I think we should cross-link those properties in the
> > > > documentation
> > > > > > and
> > > > > > > > > explain this behavior.
> > > > > > > > >
> > > > > > > > > On Mon, Feb 7, 2022 at 4:39 PM Ivan Daschinsky <
> > > > > ivandasch@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> >>3. Already implemented: when
> > > > > > > ClientConnectorConfiguration#idleTimeout
> > > > > > > > is
> > > > > > > > >> not zero, server disconnects idle clients
> > > > > > > > >> >>
> > > > > > > > >> But I suppose it would be great to have:
> > > > > > > > >> 1. If client supports keep alive, use idleTimeout
> > > > > > > > >> 2. If not, do not use it.
> > > > > > > > >>
> > > > > > > > >> But I am not sure if it is correct or not.
> > > > > > > > >>
> > > > > > > > >> пн, 7 февр. 2022 г. в 16:01, Maksim Timonin <
> > > > > > timoninmaxim@apache.org
> > > > > > > >:
> > > > > > > > >>
> > > > > > > > >> > I believe explicit is better than implicit :) Also in
> case
> > > of
> > > > > > > dynamic
> > > > > > > > >> > calculation of timeout, it can change dynamically, for
> > > example
> > > > > > > > >> restarting a
> > > > > > > > >> > cluster with different configuration should reconfigure
> > > > clients
> > > > > > too.
> > > > > > > > >> Looks
> > > > > > > > >> > complicated.
> > > > > > > > >> >
> > > > > > > > >> > My vote for WARN + javadocs with mention of this issue.
> > > > > > > > >> >
> > > > > > > > >> > On Mon, Feb 7, 2022 at 3:51 PM Pavel Tupitsyn <
> > > > > > ptupitsyn@apache.org
> > > > > > > >
> > > > > > > > >> > wrote:
> > > > > > > > >> >
> > > > > > > > >> > > > WDYT, should we add a WARN message for clients that
> > > > > configure
> > > > > > > > >> > > > keepAliveTimeout greater than idleTimeout on the
> > server
> > > > > side?
> > > > > > > > >> > >
> > > > > > > > >> > > I think we should either log a WARN, or retrieve
> > > idleTimeout
> > > > > > from
> > > > > > > > >> server
> > > > > > > > >> > > and configure heartbeatTimeout accordingly (e.g.
> divide
> > by
> > > > 2).
> > > > > > > > >> > > Thoughts?
> > > > > > > > >> > >
> > > > > > > > >> > > On Mon, Feb 7, 2022 at 3:14 PM Maksim Timonin <
> > > > > > > > >> timoninmaxim@apache.org>
> > > > > > > > >> > > wrote:
> > > > > > > > >> > >
> > > > > > > > >> > > > Hi Pavel,
> > > > > > > > >> > > >
> > > > > > > > >> > > > Thanks for the links. Yes, I forgot that the flag of
> > > > changed
> > > > > > > > >> topology
> > > > > > > > >> > is
> > > > > > > > >> > > > lazy. Also I missed that the keepAlive setting is
> > > > configured
> > > > > > on
> > > > > > > > the
> > > > > > > > >> > > client
> > > > > > > > >> > > > side (alternatively to idleTimeout that is on the
> > server
> > > > > > side).
> > > > > > > > >> > > >
> > > > > > > > >> > > > Now I understand, this feature can be helpful then.
> > > Every
> > > > > > client
> > > > > > > > can
> > > > > > > > >> > > > configure itself in case it's possible to be idle
> > > > sometimes,
> > > > > > and
> > > > > > > > >> choose
> > > > > > > > >> > > > an appropriate timeout by itself too. And by default
> > the
> > > > > > feature
> > > > > > > > >> should
> > > > > > > > >> > > be
> > > > > > > > >> > > > disabled.
> > > > > > > > >> > > >
> > > > > > > > >> > > > WDYT, should we add a WARN message for clients that
> > > > > configure
> > > > > > > > >> > > > keepAliveTimeout greater than idleTimeout on the
> > server
> > > > > side?
> > > > > > > > >> > > >
> > > > > > > > >> > > >
> > > > > > > > >> > > >
> > > > > > > > >> > > > On Mon, Feb 7, 2022 at 1:05 PM Pavel Tupitsyn <
> > > > > > > > ptupitsyn@apache.org
> > > > > > > > >> >
> > > > > > > > >> > > > wrote:
> > > > > > > > >> > > >
> > > > > > > > >> > > > > Ivan,
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > I suggest the following:
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > 1. Server sends KEEP_ALIVE feature flag, which
> means
> > > it
> > > > > > > accepts
> > > > > > > > >> > > > > OP_KEEP_ALIVE empty message
> > > > > > > > >> > > > > 2. Client sends OP_KEEP_ALIVE when the connection
> is
> > > > idle
> > > > > > for
> > > > > > > a
> > > > > > > > >> > > > > certain period of time
> > > > > > > > >> > > > > 3. Already implemented: when
> > > > > > > > >> ClientConnectorConfiguration#idleTimeout
> > > > > > > > >> > > is
> > > > > > > > >> > > > > not zero, server disconnects idle clients
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > This way we don't need server->client keepalives,
> as
> > > you
> > > > > > > > correctly
> > > > > > > > >> > > noted.
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > On Mon, Feb 7, 2022 at 12:43 PM Ivan Daschinsky <
> > > > > > > > >> ivandasch@gmail.com
> > > > > > > > >> > >
> > > > > > > > >> > > > > wrote:
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > > Pavel, I suppose that ideally:
> > > > > > > > >> > > > > > 1. Client send in handshake flag, that it
> supports
> > > > > > > KEEP_ALIVE
> > > > > > > > >> > feature
> > > > > > > > >> > > > and
> > > > > > > > >> > > > > > server takes it into account.
> > > > > > > > >> > > > > > 2. Each request of client can be considered as
> > > > > keep-alive
> > > > > > > > ping.
> > > > > > > > >> > > > > > 3. Client send failure should be processed using
> > > retry
> > > > > > > policy.
> > > > > > > > >> > > > > > 4. Server should not send keep-alive packets, it
> > is
> > > > > > > redundant,
> > > > > > > > >> but
> > > > > > > > >> > > > server
> > > > > > > > >> > > > > > should track requests from client and if there
> is
> > no
> > > > > > > requests
> > > > > > > > >> from
> > > > > > > > >> > > > client
> > > > > > > > >> > > > > > with KEEP_ALIVE feature,
> > > > > > > > >> > > > > > automatically close connection and free
> resources.
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > Similar approach is used in zookeeper clients.
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > пн, 7 февр. 2022 г. в 12:24, Pavel Tupitsyn <
> > > > > > > > >> ptupitsyn@apache.org
> > > > > > > > >> > >:
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > > Ivan,
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > > Ideally, the check should come from both
> sides.
> > > > > > > > >> > > > > > > - Client periodically sends keepalive to
> server
> > > > > > > > >> > > > > > > - Server periodically sends keepalive to
> client
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > > Feature flags will be added accordingly, so it
> > is
> > > > not
> > > > > > > > >> necessary
> > > > > > > > >> > to
> > > > > > > > >> > > > > > > implement this in all thin clients.
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > > On Mon, Feb 7, 2022 at 11:43 AM Ivan
> Daschinsky
> > <
> > > > > > > > >> > > ivandasch@gmail.com
> > > > > > > > >> > > > >
> > > > > > > > >> > > > > > > wrote:
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > > > > I suppose it is great idea, but this
> > > functionality
> > > > > can
> > > > > > > be
> > > > > > > > >> hard
> > > > > > > > >> > to
> > > > > > > > >> > > > > > > implement
> > > > > > > > >> > > > > > > > for some platforms. I.e. sync python client
> or
> > > php
> > > > > > > (there
> > > > > > > > >> is no
> > > > > > > > >> > > > real
> > > > > > > > >> > > > > > > > multithreading for python (GIL) and php is
> > > single
> > > > > > > threaded
> > > > > > > > >> by
> > > > > > > > >> > > > > design).
> > > > > > > > >> > > > > > > But
> > > > > > > > >> > > > > > > > for async clients it is not very hard to
> > > > implement.
> > > > > > > > >> > Nevertheless,
> > > > > > > > >> > > > > this
> > > > > > > > >> > > > > > > > feature should be optional, because of
> > possible
> > > > > > > technical
> > > > > > > > >> > > > > limitations.
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > Pavel, is this check mostly for client side?
> > Or
> > > > > > servers
> > > > > > > > can
> > > > > > > > >> do
> > > > > > > > >> > > some
> > > > > > > > >> > > > > > > actions
> > > > > > > > >> > > > > > > > if there is no activity from thin client
> (i.e.
> > > > > closing
> > > > > > > > >> context
> > > > > > > > >> > > and
> > > > > > > > >> > > > > free
> > > > > > > > >> > > > > > > > resources such as queries' handles and so
> on?)
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > пн, 7 февр. 2022 г. в 11:09, Pavel Tupitsyn
> <
> > > > > > > > >> > > ptupitsyn@apache.org
> > > > > > > > >> > > > >:
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > > Hi Maksim,
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > > > half-state is a possible situation when
> an
> > > > > Ignite
> > > > > > > node
> > > > > > > > >> goes
> > > > > > > > >> > > > down
> > > > > > > > >> > > > > or
> > > > > > > > >> > > > > > > > > somehow removes connection to a thin
> client
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > > Half-open state is also possible when, for
> > > > > example,
> > > > > > an
> > > > > > > > >> > > > intermediate
> > > > > > > > >> > > > > > > > router
> > > > > > > > >> > > > > > > > > is rebooted [1].
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > > This is what we seem to have encountered
> > with
> > > > one
> > > > > of
> > > > > > > our
> > > > > > > > >> > > > customers
> > > > > > > > >> > > > > -
> > > > > > > > >> > > > > > > they
> > > > > > > > >> > > > > > > > > have a stable cluster, and long-living
> > > (multiple
> > > > > > days)
> > > > > > > > >> thin
> > > > > > > > >> > > > client
> > > > > > > > >> > > > > > > > > connections which can be idle for some
> time.
> > > > > > > > >> > > > > > > > > And only when we send some data on such an
> > > idle
> > > > > > > > >> connection do
> > > > > > > > >> > > we
> > > > > > > > >> > > > > > > discover
> > > > > > > > >> > > > > > > > > that it is broken.
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > > > But with enabled (true by default)
> > > > > > > partitionAwareness
> > > > > > > > >> > feature
> > > > > > > > >> > > > > > clients
> > > > > > > > >> > > > > > > > can
> > > > > > > > >> > > > > > > > > be notified about topology changes
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > > Partition awareness is a "lazy"
> notification
> > > in
> > > > a
> > > > > > form
> > > > > > > > of
> > > > > > > > >> a
> > > > > > > > >> > > > > response
> > > > > > > > >> > > > > > > > > message flag [2].
> > > > > > > > >> > > > > > > > > You won't get one on an idle connection.
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > > > the connections are removed on the
> server
> > > side
> > > > > by
> > > > > > > > client
> > > > > > > > >> > idle
> > > > > > > > >> > > > > > timeout
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > > Idle timeout is disabled by default.
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > > > is it OK to keep such connections alive
> > for
> > > a
> > > > > long
> > > > > > > > time
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > > I think it is up to the user.
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > > > in the case of partition awareness
> > features
> > > it
> > > > > can
> > > > > > > > lead
> > > > > > > > >> to
> > > > > > > > >> > > > > wasting
> > > > > > > > >> > > > > > > TCP
> > > > > > > > >> > > > > > > > > sockets on Ignite nodes, can't it
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > > Can you please elaborate?
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > > [1]
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > >
> > > > > > > > >> > > >
> > > > > > > > >> > >
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
> > > > > > > > >> > > > > > > > > [2]
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > >
> > > > > > > > >> > > >
> > > > > > > > >> > >
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > > On Fri, Feb 4, 2022 at 4:01 PM Maksim
> > Timonin
> > > <
> > > > > > > > >> > > > > > timoninmaxim@apache.org
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > > wrote:
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > > > > Hi Pavel,
> > > > > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > > > > > > Thanks for starting this thread! Can I
> ask
> > > > some
> > > > > > > > >> questions
> > > > > > > > >> > > here
> > > > > > > > >> > > > to
> > > > > > > > >> > > > > > get
> > > > > > > > >> > > > > > > > the
> > > > > > > > >> > > > > > > > > > feature more clearly?
> > > > > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > > > > > > As I understand it correctly, half-state
> > is
> > > a
> > > > > > > possible
> > > > > > > > >> > > > situation
> > > > > > > > >> > > > > > when
> > > > > > > > >> > > > > > > > an
> > > > > > > > >> > > > > > > > > > Ignite node goes down or somehow removes
> > > > > > connection
> > > > > > > > to a
> > > > > > > > >> > thin
> > > > > > > > >> > > > > > client.
> > > > > > > > >> > > > > > > > But
> > > > > > > > >> > > > > > > > > > with enabled (true by default)
> > > > > partitionAwareness
> > > > > > > > >> feature
> > > > > > > > >> > > > clients
> > > > > > > > >> > > > > > can
> > > > > > > > >> > > > > > > > be
> > > > > > > > >> > > > > > > > > > notified about topology changes. So,
> there
> > > are
> > > > > > > > possible
> > > > > > > > >> > > cases:
> > > > > > > > >> > > > > > > > > > 1. ThinClient connects to a single node.
> > > > > > > > >> > > > > > > > > > 2. Ignite node removes connection from
> > > itself.
> > > > > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > > > > > > I like the idea for the case with a
> single
> > > > node,
> > > > > > as
> > > > > > > it
> > > > > > > > >> > helps
> > > > > > > > >> > > > fail
> > > > > > > > >> > > > > > > fast.
> > > > > > > > >> > > > > > > > > > But is it OK to connect a client to a
> > single
> > > > > node
> > > > > > > > only?
> > > > > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > > > > > > For the second one: you mention that a
> > case
> > > > for
> > > > > > the
> > > > > > > > >> second
> > > > > > > > >> > > > option
> > > > > > > > >> > > > > > is
> > > > > > > > >> > > > > > > > > > "Long-living and mostly idle connections
> > are
> > > > > > > > especially
> > > > > > > > >> > > > > susceptible
> > > > > > > > >> > > > > > > to
> > > > > > > > >> > > > > > > > > this
> > > > > > > > >> > > > > > > > > > behavior". If I understand correctly the
> > > > > > connections
> > > > > > > > are
> > > > > > > > >> > > > removed
> > > > > > > > >> > > > > on
> > > > > > > > >> > > > > > > the
> > > > > > > > >> > > > > > > > > > server side by client idle timeout. Can
> we
> > > > just
> > > > > > > > >> configure
> > > > > > > > >> > the
> > > > > > > > >> > > > > idle
> > > > > > > > >> > > > > > > > > timeout
> > > > > > > > >> > > > > > > > > > for cases where we really need keeping
> > alive
> > > > > idle
> > > > > > > > >> > > connections?
> > > > > > > > >> > > > > Are
> > > > > > > > >> > > > > > > > there
> > > > > > > > >> > > > > > > > > > any other cases with unexpectedly
> dropped
> > > > > > > connections?
> > > > > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > > > > > > I'm wondering is it OK to keep such
> > > > connections
> > > > > > > alive
> > > > > > > > >> for a
> > > > > > > > >> > > > long
> > > > > > > > >> > > > > > > time?
> > > > > > > > >> > > > > > > > > > Also in the case of partition awareness
> > > > features
> > > > > > it
> > > > > > > > can
> > > > > > > > >> > lead
> > > > > > > > >> > > to
> > > > > > > > >> > > > > > > wasting
> > > > > > > > >> > > > > > > > > TCP
> > > > > > > > >> > > > > > > > > > sockets on Ignite nodes, can't it?
> > > > > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > > > > > > Thanks!
> > > > > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > > > > > > On Thu, Feb 3, 2022 at 2:24 PM Pavel
> > > Tupitsyn
> > > > <
> > > > > > > > >> > > > > > ptupitsyn@apache.org>
> > > > > > > > >> > > > > > > > > > wrote:
> > > > > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > > > > > >> Igniters,
> > > > > > > > >> > > > > > > > > >>
> > > > > > > > >> > > > > > > > > >> Please review the proposal to add
> > heartbeat
> > > > > > > messages
> > > > > > > > to
> > > > > > > > >> > the
> > > > > > > > >> > > > thin
> > > > > > > > >> > > > > > > > client
> > > > > > > > >> > > > > > > > > >> protocol (both 2.x and 3.x) and let me
> > know
> > > > > your
> > > > > > > > >> thoughts:
> > > > > > > > >> > > > > > > > > >>
> > > > > > > > >> > > > > > > > > >>
> > > > > > > > >> > > > > > > > > >>
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > >
> > > > > > > > >> > > >
> > > > > > > > >> > >
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
> > > > > > > > >> > > > > > > > > >>
> > > > > > > > >> > > > > > > > > >
> > > > > > > > >> > > > > > > > >
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > > > --
> > > > > > > > >> > > > > > > > Sincerely yours, Ivan Daschinskiy
> > > > > > > > >> > > > > > > >
> > > > > > > > >> > > > > > >
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > > > --
> > > > > > > > >> > > > > > Sincerely yours, Ivan Daschinskiy
> > > > > > > > >> > > > > >
> > > > > > > > >> > > > >
> > > > > > > > >> > > >
> > > > > > > > >> > >
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> --
> > > > > > > > >> Sincerely yours, Ivan Daschinskiy
> > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sincerely yours, Ivan Daschinskiy
> > > > >
> > > >
> > >
> > >
> > > --
> > > Sincerely yours, Ivan Daschinskiy
> > >
> >
>
>
> --
> Sincerely yours, Ivan Daschinskiy
>

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Ivan Daschinsky <iv...@gmail.com>.
Regarding discussion here [1]

I suppose that this feature, despite the fact that initial intention of
Pavel was different, can drastically
improve the usage pattern of thin clients and give a lot of opportunities
if the following is done:

1. GridNioServer has a great feature -- idle timeout. If  a server did not
receive any from a client -- it will be kicked off.
    But there are some scenarios that make the use of this feature
impossible:
a. Multiple workers waiting for batch tasks and relatively low requests
rate -- this services will be often kicked off and must reconnect.
In order to prevent this behaviour, the user must implement a kind of
heartbeating by himself.
b. Quite often user may want to implement leader-follower pattern for
services for HA, so followers also will be considered as idle. Kicking off
these followers
is not acceptable, so user  should also implement heartbeating by himself.

My proposition is:
1. Add two flags -- enable/disable heartbeats, and very optional heartbeat
timeout. Set enable to true by default, timeout to default heartbeat
timeout.
2. If server and client both support this feature, and heartbeats are not
explicitly disabled on client side:
3. Response to heartbeat request -- is idle timeout. If idle timeout is set
on the server side , set heartbeat timeout to one-third of it, instead set
to default or specified value.

Pros:
1. Easy to set up -- just flag on client side and just set timeout on
server side.
2. Hard to configure improperly, i.e set heartbeat timeout not short enough
in order to prevent kicking out by server.
3. If the user just wants heartbeats without setting idle timeout --
heartbeats are by default on and with reasonable timeout.

Cons:
1. If someone will rely on old behavior and just wants to drop his clients
on timeout -- this will not work without reconfiguring, he should disable
heartbeats.
But I cannot even imagine that someone will find this behaviour desirable.
I strongly believe that this behaviour prevents users from using
idleTimeout on server side.

[1] -- https://github.com/apache/ignite/pull/9817#discussion_r805628955

пт, 11 февр. 2022 г. в 10:58, Pavel Tupitsyn <pt...@apache.org>:

> I've prepared a PR, please have a look:
> https://github.com/apache/ignite/pull/9817
>
> On Mon, Feb 7, 2022 at 6:37 PM Ivan Daschinsky <iv...@gmail.com>
> wrote:
>
> > I see potential in this feature, especially if we use something like
> > continuous query. Stale clients can consume a lot of resources and it is
> > worth kick these clients out.
> >
> > пн, 7 февр. 2022 г. в 18:25, Pavel Tupitsyn <pt...@apache.org>:
> >
> > > > If we use new approach, we can reduce this timeout. But this can
> affect
> > > old clients.
> > >
> > > idleTimeout is disabled by default, we are not going to change this.
> > >
> > > > Also, let's think about that sending heartbeats and interval of
> sending
> > > > heartbeats could be calculated on the server side (i.e. one third of
> > idle
> > > > timeout) and sent to the client during handshake.
> > > > Also we can introduce something like a negotiation mechanism as in
> > > > zookeeper.
> > >
> > > I tend to agree with Maksim here, let's keep it simple and explicit.
> > > Log a warning, but don't do anything clever.
> > >
> > > On Mon, Feb 7, 2022 at 6:15 PM Ivan Daschinsky <iv...@gmail.com>
> > > wrote:
> > >
> > > > >> idleTimeout already exists, I don't think we should change the way
> > it
> > > > works (or did I misunderstand you?)
> > > > If we use new approach, we can reduce this timeout. But this can
> affect
> > > old
> > > > clients.
> > > >
> > > >
> > > > Also, let's think about that sending heartbeats and interval of
> sending
> > > > heartbeats could be calculated on the server side (i.e. one third of
> > idle
> > > > timeout) and sent to the client
> > > > during handshake.
> > > > Also we can introduce something like a negotiation mechanism as in
> > > > zookeeper.
> > > >
> > > >
> > > > пн, 7 февр. 2022 г. в 18:05, Pavel Tupitsyn <pt...@apache.org>:
> > > >
> > > > > Igor,
> > > > >
> > > > > > Maybe clients should pass this information on to the handshake.
> > > > >
> > > > > Do you think we should log a mismatched timeout warning on the
> > server,
> > > > not
> > > > > on the client?
> > > > > Or should we do both?
> > > > >
> > > > >
> > > > > I've updated the proposal with OP_GET_IDLE_TIMEOUT and some other
> > > details
> > > > > discussed above.
> > > > >
> > > > > On Mon, Feb 7, 2022 at 5:42 PM Igor Sapego <is...@apache.org>
> > wrote:
> > > > >
> > > > > > Feature seems useful for me as it makes connection management
> more
> > > > robust
> > > > > > and
> > > > > > predictable.
> > > > > >
> > > > > > I agree with Pavel, that we should print warning when heartbeat
> > > period
> > > > is
> > > > > > larger than
> > > > > > idle timeout, but I see a problem here as idle timeout is
> > configured
> > > on
> > > > > > server and is not
> > > > > > known to clients, while heartbeats configured on clients and
> their
> > > > period
> > > > > > is not known
> > > > > > to the server. Maybe clients should pass this information on to
> the
> > > > > > handshake.
> > > > > >
> > > > > > Regarding Python and PHP clients - can not we use some kind of
> > timers
> > > > to
> > > > > > implement
> > > > > > this feature?
> > > > > >
> > > > > > Best Regards,
> > > > > > Igor
> > > > > >
> > > > > >
> > > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <
> > ptupitsyn@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Maksim, agree. Let's not be too clever and only log a warning.
> > > > > > >
> > > > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <
> > > ptupitsyn@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Ivan, idleTimeout already exists, I don't think we should
> > change
> > > > the
> > > > > > way
> > > > > > > > it works (or did I misunderstand you?)
> > > > > > > >
> > > > > > > > Of course, enabling heartbeats means that otherwise idle
> > clients
> > > > will
> > > > > > no
> > > > > > > > longer be disconnected by the server.
> > > > > > > > I think we should cross-link those properties in the
> > > documentation
> > > > > and
> > > > > > > > explain this behavior.
> > > > > > > >
> > > > > > > > On Mon, Feb 7, 2022 at 4:39 PM Ivan Daschinsky <
> > > > ivandasch@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> >>3. Already implemented: when
> > > > > > ClientConnectorConfiguration#idleTimeout
> > > > > > > is
> > > > > > > >> not zero, server disconnects idle clients
> > > > > > > >> >>
> > > > > > > >> But I suppose it would be great to have:
> > > > > > > >> 1. If client supports keep alive, use idleTimeout
> > > > > > > >> 2. If not, do not use it.
> > > > > > > >>
> > > > > > > >> But I am not sure if it is correct or not.
> > > > > > > >>
> > > > > > > >> пн, 7 февр. 2022 г. в 16:01, Maksim Timonin <
> > > > > timoninmaxim@apache.org
> > > > > > >:
> > > > > > > >>
> > > > > > > >> > I believe explicit is better than implicit :) Also in case
> > of
> > > > > > dynamic
> > > > > > > >> > calculation of timeout, it can change dynamically, for
> > example
> > > > > > > >> restarting a
> > > > > > > >> > cluster with different configuration should reconfigure
> > > clients
> > > > > too.
> > > > > > > >> Looks
> > > > > > > >> > complicated.
> > > > > > > >> >
> > > > > > > >> > My vote for WARN + javadocs with mention of this issue.
> > > > > > > >> >
> > > > > > > >> > On Mon, Feb 7, 2022 at 3:51 PM Pavel Tupitsyn <
> > > > > ptupitsyn@apache.org
> > > > > > >
> > > > > > > >> > wrote:
> > > > > > > >> >
> > > > > > > >> > > > WDYT, should we add a WARN message for clients that
> > > > configure
> > > > > > > >> > > > keepAliveTimeout greater than idleTimeout on the
> server
> > > > side?
> > > > > > > >> > >
> > > > > > > >> > > I think we should either log a WARN, or retrieve
> > idleTimeout
> > > > > from
> > > > > > > >> server
> > > > > > > >> > > and configure heartbeatTimeout accordingly (e.g. divide
> by
> > > 2).
> > > > > > > >> > > Thoughts?
> > > > > > > >> > >
> > > > > > > >> > > On Mon, Feb 7, 2022 at 3:14 PM Maksim Timonin <
> > > > > > > >> timoninmaxim@apache.org>
> > > > > > > >> > > wrote:
> > > > > > > >> > >
> > > > > > > >> > > > Hi Pavel,
> > > > > > > >> > > >
> > > > > > > >> > > > Thanks for the links. Yes, I forgot that the flag of
> > > changed
> > > > > > > >> topology
> > > > > > > >> > is
> > > > > > > >> > > > lazy. Also I missed that the keepAlive setting is
> > > configured
> > > > > on
> > > > > > > the
> > > > > > > >> > > client
> > > > > > > >> > > > side (alternatively to idleTimeout that is on the
> server
> > > > > side).
> > > > > > > >> > > >
> > > > > > > >> > > > Now I understand, this feature can be helpful then.
> > Every
> > > > > client
> > > > > > > can
> > > > > > > >> > > > configure itself in case it's possible to be idle
> > > sometimes,
> > > > > and
> > > > > > > >> choose
> > > > > > > >> > > > an appropriate timeout by itself too. And by default
> the
> > > > > feature
> > > > > > > >> should
> > > > > > > >> > > be
> > > > > > > >> > > > disabled.
> > > > > > > >> > > >
> > > > > > > >> > > > WDYT, should we add a WARN message for clients that
> > > > configure
> > > > > > > >> > > > keepAliveTimeout greater than idleTimeout on the
> server
> > > > side?
> > > > > > > >> > > >
> > > > > > > >> > > >
> > > > > > > >> > > >
> > > > > > > >> > > > On Mon, Feb 7, 2022 at 1:05 PM Pavel Tupitsyn <
> > > > > > > ptupitsyn@apache.org
> > > > > > > >> >
> > > > > > > >> > > > wrote:
> > > > > > > >> > > >
> > > > > > > >> > > > > Ivan,
> > > > > > > >> > > > >
> > > > > > > >> > > > > I suggest the following:
> > > > > > > >> > > > >
> > > > > > > >> > > > > 1. Server sends KEEP_ALIVE feature flag, which means
> > it
> > > > > > accepts
> > > > > > > >> > > > > OP_KEEP_ALIVE empty message
> > > > > > > >> > > > > 2. Client sends OP_KEEP_ALIVE when the connection is
> > > idle
> > > > > for
> > > > > > a
> > > > > > > >> > > > > certain period of time
> > > > > > > >> > > > > 3. Already implemented: when
> > > > > > > >> ClientConnectorConfiguration#idleTimeout
> > > > > > > >> > > is
> > > > > > > >> > > > > not zero, server disconnects idle clients
> > > > > > > >> > > > >
> > > > > > > >> > > > > This way we don't need server->client keepalives, as
> > you
> > > > > > > correctly
> > > > > > > >> > > noted.
> > > > > > > >> > > > >
> > > > > > > >> > > > > On Mon, Feb 7, 2022 at 12:43 PM Ivan Daschinsky <
> > > > > > > >> ivandasch@gmail.com
> > > > > > > >> > >
> > > > > > > >> > > > > wrote:
> > > > > > > >> > > > >
> > > > > > > >> > > > > > Pavel, I suppose that ideally:
> > > > > > > >> > > > > > 1. Client send in handshake flag, that it supports
> > > > > > KEEP_ALIVE
> > > > > > > >> > feature
> > > > > > > >> > > > and
> > > > > > > >> > > > > > server takes it into account.
> > > > > > > >> > > > > > 2. Each request of client can be considered as
> > > > keep-alive
> > > > > > > ping.
> > > > > > > >> > > > > > 3. Client send failure should be processed using
> > retry
> > > > > > policy.
> > > > > > > >> > > > > > 4. Server should not send keep-alive packets, it
> is
> > > > > > redundant,
> > > > > > > >> but
> > > > > > > >> > > > server
> > > > > > > >> > > > > > should track requests from client and if there is
> no
> > > > > > requests
> > > > > > > >> from
> > > > > > > >> > > > client
> > > > > > > >> > > > > > with KEEP_ALIVE feature,
> > > > > > > >> > > > > > automatically close connection and free resources.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > Similar approach is used in zookeeper clients.
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > пн, 7 февр. 2022 г. в 12:24, Pavel Tupitsyn <
> > > > > > > >> ptupitsyn@apache.org
> > > > > > > >> > >:
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > > Ivan,
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > Ideally, the check should come from both sides.
> > > > > > > >> > > > > > > - Client periodically sends keepalive to server
> > > > > > > >> > > > > > > - Server periodically sends keepalive to client
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > Feature flags will be added accordingly, so it
> is
> > > not
> > > > > > > >> necessary
> > > > > > > >> > to
> > > > > > > >> > > > > > > implement this in all thin clients.
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > On Mon, Feb 7, 2022 at 11:43 AM Ivan Daschinsky
> <
> > > > > > > >> > > ivandasch@gmail.com
> > > > > > > >> > > > >
> > > > > > > >> > > > > > > wrote:
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > > > > I suppose it is great idea, but this
> > functionality
> > > > can
> > > > > > be
> > > > > > > >> hard
> > > > > > > >> > to
> > > > > > > >> > > > > > > implement
> > > > > > > >> > > > > > > > for some platforms. I.e. sync python client or
> > php
> > > > > > (there
> > > > > > > >> is no
> > > > > > > >> > > > real
> > > > > > > >> > > > > > > > multithreading for python (GIL) and php is
> > single
> > > > > > threaded
> > > > > > > >> by
> > > > > > > >> > > > > design).
> > > > > > > >> > > > > > > But
> > > > > > > >> > > > > > > > for async clients it is not very hard to
> > > implement.
> > > > > > > >> > Nevertheless,
> > > > > > > >> > > > > this
> > > > > > > >> > > > > > > > feature should be optional, because of
> possible
> > > > > > technical
> > > > > > > >> > > > > limitations.
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > Pavel, is this check mostly for client side?
> Or
> > > > > servers
> > > > > > > can
> > > > > > > >> do
> > > > > > > >> > > some
> > > > > > > >> > > > > > > actions
> > > > > > > >> > > > > > > > if there is no activity from thin client (i.e.
> > > > closing
> > > > > > > >> context
> > > > > > > >> > > and
> > > > > > > >> > > > > free
> > > > > > > >> > > > > > > > resources such as queries' handles and so on?)
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > пн, 7 февр. 2022 г. в 11:09, Pavel Tupitsyn <
> > > > > > > >> > > ptupitsyn@apache.org
> > > > > > > >> > > > >:
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > > Hi Maksim,
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > > half-state is a possible situation when an
> > > > Ignite
> > > > > > node
> > > > > > > >> goes
> > > > > > > >> > > > down
> > > > > > > >> > > > > or
> > > > > > > >> > > > > > > > > somehow removes connection to a thin client
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > Half-open state is also possible when, for
> > > > example,
> > > > > an
> > > > > > > >> > > > intermediate
> > > > > > > >> > > > > > > > router
> > > > > > > >> > > > > > > > > is rebooted [1].
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > This is what we seem to have encountered
> with
> > > one
> > > > of
> > > > > > our
> > > > > > > >> > > > customers
> > > > > > > >> > > > > -
> > > > > > > >> > > > > > > they
> > > > > > > >> > > > > > > > > have a stable cluster, and long-living
> > (multiple
> > > > > days)
> > > > > > > >> thin
> > > > > > > >> > > > client
> > > > > > > >> > > > > > > > > connections which can be idle for some time.
> > > > > > > >> > > > > > > > > And only when we send some data on such an
> > idle
> > > > > > > >> connection do
> > > > > > > >> > > we
> > > > > > > >> > > > > > > discover
> > > > > > > >> > > > > > > > > that it is broken.
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > > But with enabled (true by default)
> > > > > > partitionAwareness
> > > > > > > >> > feature
> > > > > > > >> > > > > > clients
> > > > > > > >> > > > > > > > can
> > > > > > > >> > > > > > > > > be notified about topology changes
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > Partition awareness is a "lazy" notification
> > in
> > > a
> > > > > form
> > > > > > > of
> > > > > > > >> a
> > > > > > > >> > > > > response
> > > > > > > >> > > > > > > > > message flag [2].
> > > > > > > >> > > > > > > > > You won't get one on an idle connection.
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > > the connections are removed on the server
> > side
> > > > by
> > > > > > > client
> > > > > > > >> > idle
> > > > > > > >> > > > > > timeout
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > Idle timeout is disabled by default.
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > > is it OK to keep such connections alive
> for
> > a
> > > > long
> > > > > > > time
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > I think it is up to the user.
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > > in the case of partition awareness
> features
> > it
> > > > can
> > > > > > > lead
> > > > > > > >> to
> > > > > > > >> > > > > wasting
> > > > > > > >> > > > > > > TCP
> > > > > > > >> > > > > > > > > sockets on Ignite nodes, can't it
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > Can you please elaborate?
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > [1]
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > >
> > > > > > > >> > > > >
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
> > > > > > > >> > > > > > > > > [2]
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > >
> > > > > > > >> > > > >
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > On Fri, Feb 4, 2022 at 4:01 PM Maksim
> Timonin
> > <
> > > > > > > >> > > > > > timoninmaxim@apache.org
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > > wrote:
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > > > > Hi Pavel,
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > > > Thanks for starting this thread! Can I ask
> > > some
> > > > > > > >> questions
> > > > > > > >> > > here
> > > > > > > >> > > > to
> > > > > > > >> > > > > > get
> > > > > > > >> > > > > > > > the
> > > > > > > >> > > > > > > > > > feature more clearly?
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > > > As I understand it correctly, half-state
> is
> > a
> > > > > > possible
> > > > > > > >> > > > situation
> > > > > > > >> > > > > > when
> > > > > > > >> > > > > > > > an
> > > > > > > >> > > > > > > > > > Ignite node goes down or somehow removes
> > > > > connection
> > > > > > > to a
> > > > > > > >> > thin
> > > > > > > >> > > > > > client.
> > > > > > > >> > > > > > > > But
> > > > > > > >> > > > > > > > > > with enabled (true by default)
> > > > partitionAwareness
> > > > > > > >> feature
> > > > > > > >> > > > clients
> > > > > > > >> > > > > > can
> > > > > > > >> > > > > > > > be
> > > > > > > >> > > > > > > > > > notified about topology changes. So, there
> > are
> > > > > > > possible
> > > > > > > >> > > cases:
> > > > > > > >> > > > > > > > > > 1. ThinClient connects to a single node.
> > > > > > > >> > > > > > > > > > 2. Ignite node removes connection from
> > itself.
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > > > I like the idea for the case with a single
> > > node,
> > > > > as
> > > > > > it
> > > > > > > >> > helps
> > > > > > > >> > > > fail
> > > > > > > >> > > > > > > fast.
> > > > > > > >> > > > > > > > > > But is it OK to connect a client to a
> single
> > > > node
> > > > > > > only?
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > > > For the second one: you mention that a
> case
> > > for
> > > > > the
> > > > > > > >> second
> > > > > > > >> > > > option
> > > > > > > >> > > > > > is
> > > > > > > >> > > > > > > > > > "Long-living and mostly idle connections
> are
> > > > > > > especially
> > > > > > > >> > > > > susceptible
> > > > > > > >> > > > > > > to
> > > > > > > >> > > > > > > > > this
> > > > > > > >> > > > > > > > > > behavior". If I understand correctly the
> > > > > connections
> > > > > > > are
> > > > > > > >> > > > removed
> > > > > > > >> > > > > on
> > > > > > > >> > > > > > > the
> > > > > > > >> > > > > > > > > > server side by client idle timeout. Can we
> > > just
> > > > > > > >> configure
> > > > > > > >> > the
> > > > > > > >> > > > > idle
> > > > > > > >> > > > > > > > > timeout
> > > > > > > >> > > > > > > > > > for cases where we really need keeping
> alive
> > > > idle
> > > > > > > >> > > connections?
> > > > > > > >> > > > > Are
> > > > > > > >> > > > > > > > there
> > > > > > > >> > > > > > > > > > any other cases with unexpectedly dropped
> > > > > > connections?
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > > > I'm wondering is it OK to keep such
> > > connections
> > > > > > alive
> > > > > > > >> for a
> > > > > > > >> > > > long
> > > > > > > >> > > > > > > time?
> > > > > > > >> > > > > > > > > > Also in the case of partition awareness
> > > features
> > > > > it
> > > > > > > can
> > > > > > > >> > lead
> > > > > > > >> > > to
> > > > > > > >> > > > > > > wasting
> > > > > > > >> > > > > > > > > TCP
> > > > > > > >> > > > > > > > > > sockets on Ignite nodes, can't it?
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > > > Thanks!
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > > > On Thu, Feb 3, 2022 at 2:24 PM Pavel
> > Tupitsyn
> > > <
> > > > > > > >> > > > > > ptupitsyn@apache.org>
> > > > > > > >> > > > > > > > > > wrote:
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > > >> Igniters,
> > > > > > > >> > > > > > > > > >>
> > > > > > > >> > > > > > > > > >> Please review the proposal to add
> heartbeat
> > > > > > messages
> > > > > > > to
> > > > > > > >> > the
> > > > > > > >> > > > thin
> > > > > > > >> > > > > > > > client
> > > > > > > >> > > > > > > > > >> protocol (both 2.x and 3.x) and let me
> know
> > > > your
> > > > > > > >> thoughts:
> > > > > > > >> > > > > > > > > >>
> > > > > > > >> > > > > > > > > >>
> > > > > > > >> > > > > > > > > >>
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > >
> > > > > > > >> > > > >
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
> > > > > > > >> > > > > > > > > >>
> > > > > > > >> > > > > > > > > >
> > > > > > > >> > > > > > > > >
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > > > --
> > > > > > > >> > > > > > > > Sincerely yours, Ivan Daschinskiy
> > > > > > > >> > > > > > > >
> > > > > > > >> > > > > > >
> > > > > > > >> > > > > >
> > > > > > > >> > > > > >
> > > > > > > >> > > > > > --
> > > > > > > >> > > > > > Sincerely yours, Ivan Daschinskiy
> > > > > > > >> > > > > >
> > > > > > > >> > > > >
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >> Sincerely yours, Ivan Daschinskiy
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Sincerely yours, Ivan Daschinskiy
> > > >
> > >
> >
> >
> > --
> > Sincerely yours, Ivan Daschinskiy
> >
>


-- 
Sincerely yours, Ivan Daschinskiy

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Pavel Tupitsyn <pt...@apache.org>.
I've prepared a PR, please have a look:
https://github.com/apache/ignite/pull/9817

On Mon, Feb 7, 2022 at 6:37 PM Ivan Daschinsky <iv...@gmail.com> wrote:

> I see potential in this feature, especially if we use something like
> continuous query. Stale clients can consume a lot of resources and it is
> worth kick these clients out.
>
> пн, 7 февр. 2022 г. в 18:25, Pavel Tupitsyn <pt...@apache.org>:
>
> > > If we use new approach, we can reduce this timeout. But this can affect
> > old clients.
> >
> > idleTimeout is disabled by default, we are not going to change this.
> >
> > > Also, let's think about that sending heartbeats and interval of sending
> > > heartbeats could be calculated on the server side (i.e. one third of
> idle
> > > timeout) and sent to the client during handshake.
> > > Also we can introduce something like a negotiation mechanism as in
> > > zookeeper.
> >
> > I tend to agree with Maksim here, let's keep it simple and explicit.
> > Log a warning, but don't do anything clever.
> >
> > On Mon, Feb 7, 2022 at 6:15 PM Ivan Daschinsky <iv...@gmail.com>
> > wrote:
> >
> > > >> idleTimeout already exists, I don't think we should change the way
> it
> > > works (or did I misunderstand you?)
> > > If we use new approach, we can reduce this timeout. But this can affect
> > old
> > > clients.
> > >
> > >
> > > Also, let's think about that sending heartbeats and interval of sending
> > > heartbeats could be calculated on the server side (i.e. one third of
> idle
> > > timeout) and sent to the client
> > > during handshake.
> > > Also we can introduce something like a negotiation mechanism as in
> > > zookeeper.
> > >
> > >
> > > пн, 7 февр. 2022 г. в 18:05, Pavel Tupitsyn <pt...@apache.org>:
> > >
> > > > Igor,
> > > >
> > > > > Maybe clients should pass this information on to the handshake.
> > > >
> > > > Do you think we should log a mismatched timeout warning on the
> server,
> > > not
> > > > on the client?
> > > > Or should we do both?
> > > >
> > > >
> > > > I've updated the proposal with OP_GET_IDLE_TIMEOUT and some other
> > details
> > > > discussed above.
> > > >
> > > > On Mon, Feb 7, 2022 at 5:42 PM Igor Sapego <is...@apache.org>
> wrote:
> > > >
> > > > > Feature seems useful for me as it makes connection management more
> > > robust
> > > > > and
> > > > > predictable.
> > > > >
> > > > > I agree with Pavel, that we should print warning when heartbeat
> > period
> > > is
> > > > > larger than
> > > > > idle timeout, but I see a problem here as idle timeout is
> configured
> > on
> > > > > server and is not
> > > > > known to clients, while heartbeats configured on clients and their
> > > period
> > > > > is not known
> > > > > to the server. Maybe clients should pass this information on to the
> > > > > handshake.
> > > > >
> > > > > Regarding Python and PHP clients - can not we use some kind of
> timers
> > > to
> > > > > implement
> > > > > this feature?
> > > > >
> > > > > Best Regards,
> > > > > Igor
> > > > >
> > > > >
> > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <
> ptupitsyn@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Maksim, agree. Let's not be too clever and only log a warning.
> > > > > >
> > > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <
> > ptupitsyn@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Ivan, idleTimeout already exists, I don't think we should
> change
> > > the
> > > > > way
> > > > > > > it works (or did I misunderstand you?)
> > > > > > >
> > > > > > > Of course, enabling heartbeats means that otherwise idle
> clients
> > > will
> > > > > no
> > > > > > > longer be disconnected by the server.
> > > > > > > I think we should cross-link those properties in the
> > documentation
> > > > and
> > > > > > > explain this behavior.
> > > > > > >
> > > > > > > On Mon, Feb 7, 2022 at 4:39 PM Ivan Daschinsky <
> > > ivandasch@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> >>3. Already implemented: when
> > > > > ClientConnectorConfiguration#idleTimeout
> > > > > > is
> > > > > > >> not zero, server disconnects idle clients
> > > > > > >> >>
> > > > > > >> But I suppose it would be great to have:
> > > > > > >> 1. If client supports keep alive, use idleTimeout
> > > > > > >> 2. If not, do not use it.
> > > > > > >>
> > > > > > >> But I am not sure if it is correct or not.
> > > > > > >>
> > > > > > >> пн, 7 февр. 2022 г. в 16:01, Maksim Timonin <
> > > > timoninmaxim@apache.org
> > > > > >:
> > > > > > >>
> > > > > > >> > I believe explicit is better than implicit :) Also in case
> of
> > > > > dynamic
> > > > > > >> > calculation of timeout, it can change dynamically, for
> example
> > > > > > >> restarting a
> > > > > > >> > cluster with different configuration should reconfigure
> > clients
> > > > too.
> > > > > > >> Looks
> > > > > > >> > complicated.
> > > > > > >> >
> > > > > > >> > My vote for WARN + javadocs with mention of this issue.
> > > > > > >> >
> > > > > > >> > On Mon, Feb 7, 2022 at 3:51 PM Pavel Tupitsyn <
> > > > ptupitsyn@apache.org
> > > > > >
> > > > > > >> > wrote:
> > > > > > >> >
> > > > > > >> > > > WDYT, should we add a WARN message for clients that
> > > configure
> > > > > > >> > > > keepAliveTimeout greater than idleTimeout on the server
> > > side?
> > > > > > >> > >
> > > > > > >> > > I think we should either log a WARN, or retrieve
> idleTimeout
> > > > from
> > > > > > >> server
> > > > > > >> > > and configure heartbeatTimeout accordingly (e.g. divide by
> > 2).
> > > > > > >> > > Thoughts?
> > > > > > >> > >
> > > > > > >> > > On Mon, Feb 7, 2022 at 3:14 PM Maksim Timonin <
> > > > > > >> timoninmaxim@apache.org>
> > > > > > >> > > wrote:
> > > > > > >> > >
> > > > > > >> > > > Hi Pavel,
> > > > > > >> > > >
> > > > > > >> > > > Thanks for the links. Yes, I forgot that the flag of
> > changed
> > > > > > >> topology
> > > > > > >> > is
> > > > > > >> > > > lazy. Also I missed that the keepAlive setting is
> > configured
> > > > on
> > > > > > the
> > > > > > >> > > client
> > > > > > >> > > > side (alternatively to idleTimeout that is on the server
> > > > side).
> > > > > > >> > > >
> > > > > > >> > > > Now I understand, this feature can be helpful then.
> Every
> > > > client
> > > > > > can
> > > > > > >> > > > configure itself in case it's possible to be idle
> > sometimes,
> > > > and
> > > > > > >> choose
> > > > > > >> > > > an appropriate timeout by itself too. And by default the
> > > > feature
> > > > > > >> should
> > > > > > >> > > be
> > > > > > >> > > > disabled.
> > > > > > >> > > >
> > > > > > >> > > > WDYT, should we add a WARN message for clients that
> > > configure
> > > > > > >> > > > keepAliveTimeout greater than idleTimeout on the server
> > > side?
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > > > On Mon, Feb 7, 2022 at 1:05 PM Pavel Tupitsyn <
> > > > > > ptupitsyn@apache.org
> > > > > > >> >
> > > > > > >> > > > wrote:
> > > > > > >> > > >
> > > > > > >> > > > > Ivan,
> > > > > > >> > > > >
> > > > > > >> > > > > I suggest the following:
> > > > > > >> > > > >
> > > > > > >> > > > > 1. Server sends KEEP_ALIVE feature flag, which means
> it
> > > > > accepts
> > > > > > >> > > > > OP_KEEP_ALIVE empty message
> > > > > > >> > > > > 2. Client sends OP_KEEP_ALIVE when the connection is
> > idle
> > > > for
> > > > > a
> > > > > > >> > > > > certain period of time
> > > > > > >> > > > > 3. Already implemented: when
> > > > > > >> ClientConnectorConfiguration#idleTimeout
> > > > > > >> > > is
> > > > > > >> > > > > not zero, server disconnects idle clients
> > > > > > >> > > > >
> > > > > > >> > > > > This way we don't need server->client keepalives, as
> you
> > > > > > correctly
> > > > > > >> > > noted.
> > > > > > >> > > > >
> > > > > > >> > > > > On Mon, Feb 7, 2022 at 12:43 PM Ivan Daschinsky <
> > > > > > >> ivandasch@gmail.com
> > > > > > >> > >
> > > > > > >> > > > > wrote:
> > > > > > >> > > > >
> > > > > > >> > > > > > Pavel, I suppose that ideally:
> > > > > > >> > > > > > 1. Client send in handshake flag, that it supports
> > > > > KEEP_ALIVE
> > > > > > >> > feature
> > > > > > >> > > > and
> > > > > > >> > > > > > server takes it into account.
> > > > > > >> > > > > > 2. Each request of client can be considered as
> > > keep-alive
> > > > > > ping.
> > > > > > >> > > > > > 3. Client send failure should be processed using
> retry
> > > > > policy.
> > > > > > >> > > > > > 4. Server should not send keep-alive packets, it is
> > > > > redundant,
> > > > > > >> but
> > > > > > >> > > > server
> > > > > > >> > > > > > should track requests from client and if there is no
> > > > > requests
> > > > > > >> from
> > > > > > >> > > > client
> > > > > > >> > > > > > with KEEP_ALIVE feature,
> > > > > > >> > > > > > automatically close connection and free resources.
> > > > > > >> > > > > >
> > > > > > >> > > > > > Similar approach is used in zookeeper clients.
> > > > > > >> > > > > >
> > > > > > >> > > > > > пн, 7 февр. 2022 г. в 12:24, Pavel Tupitsyn <
> > > > > > >> ptupitsyn@apache.org
> > > > > > >> > >:
> > > > > > >> > > > > >
> > > > > > >> > > > > > > Ivan,
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Ideally, the check should come from both sides.
> > > > > > >> > > > > > > - Client periodically sends keepalive to server
> > > > > > >> > > > > > > - Server periodically sends keepalive to client
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Feature flags will be added accordingly, so it is
> > not
> > > > > > >> necessary
> > > > > > >> > to
> > > > > > >> > > > > > > implement this in all thin clients.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > On Mon, Feb 7, 2022 at 11:43 AM Ivan Daschinsky <
> > > > > > >> > > ivandasch@gmail.com
> > > > > > >> > > > >
> > > > > > >> > > > > > > wrote:
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > > I suppose it is great idea, but this
> functionality
> > > can
> > > > > be
> > > > > > >> hard
> > > > > > >> > to
> > > > > > >> > > > > > > implement
> > > > > > >> > > > > > > > for some platforms. I.e. sync python client or
> php
> > > > > (there
> > > > > > >> is no
> > > > > > >> > > > real
> > > > > > >> > > > > > > > multithreading for python (GIL) and php is
> single
> > > > > threaded
> > > > > > >> by
> > > > > > >> > > > > design).
> > > > > > >> > > > > > > But
> > > > > > >> > > > > > > > for async clients it is not very hard to
> > implement.
> > > > > > >> > Nevertheless,
> > > > > > >> > > > > this
> > > > > > >> > > > > > > > feature should be optional, because of possible
> > > > > technical
> > > > > > >> > > > > limitations.
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > Pavel, is this check mostly for client side? Or
> > > > servers
> > > > > > can
> > > > > > >> do
> > > > > > >> > > some
> > > > > > >> > > > > > > actions
> > > > > > >> > > > > > > > if there is no activity from thin client (i.e.
> > > closing
> > > > > > >> context
> > > > > > >> > > and
> > > > > > >> > > > > free
> > > > > > >> > > > > > > > resources such as queries' handles and so on?)
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > пн, 7 февр. 2022 г. в 11:09, Pavel Tupitsyn <
> > > > > > >> > > ptupitsyn@apache.org
> > > > > > >> > > > >:
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > > Hi Maksim,
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > > half-state is a possible situation when an
> > > Ignite
> > > > > node
> > > > > > >> goes
> > > > > > >> > > > down
> > > > > > >> > > > > or
> > > > > > >> > > > > > > > > somehow removes connection to a thin client
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > Half-open state is also possible when, for
> > > example,
> > > > an
> > > > > > >> > > > intermediate
> > > > > > >> > > > > > > > router
> > > > > > >> > > > > > > > > is rebooted [1].
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > This is what we seem to have encountered with
> > one
> > > of
> > > > > our
> > > > > > >> > > > customers
> > > > > > >> > > > > -
> > > > > > >> > > > > > > they
> > > > > > >> > > > > > > > > have a stable cluster, and long-living
> (multiple
> > > > days)
> > > > > > >> thin
> > > > > > >> > > > client
> > > > > > >> > > > > > > > > connections which can be idle for some time.
> > > > > > >> > > > > > > > > And only when we send some data on such an
> idle
> > > > > > >> connection do
> > > > > > >> > > we
> > > > > > >> > > > > > > discover
> > > > > > >> > > > > > > > > that it is broken.
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > > But with enabled (true by default)
> > > > > partitionAwareness
> > > > > > >> > feature
> > > > > > >> > > > > > clients
> > > > > > >> > > > > > > > can
> > > > > > >> > > > > > > > > be notified about topology changes
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > Partition awareness is a "lazy" notification
> in
> > a
> > > > form
> > > > > > of
> > > > > > >> a
> > > > > > >> > > > > response
> > > > > > >> > > > > > > > > message flag [2].
> > > > > > >> > > > > > > > > You won't get one on an idle connection.
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > > the connections are removed on the server
> side
> > > by
> > > > > > client
> > > > > > >> > idle
> > > > > > >> > > > > > timeout
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > Idle timeout is disabled by default.
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > > is it OK to keep such connections alive for
> a
> > > long
> > > > > > time
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > I think it is up to the user.
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > > in the case of partition awareness features
> it
> > > can
> > > > > > lead
> > > > > > >> to
> > > > > > >> > > > > wasting
> > > > > > >> > > > > > > TCP
> > > > > > >> > > > > > > > > sockets on Ignite nodes, can't it
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > Can you please elaborate?
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > [1]
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
> > > > > > >> > > > > > > > > [2]
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > On Fri, Feb 4, 2022 at 4:01 PM Maksim Timonin
> <
> > > > > > >> > > > > > timoninmaxim@apache.org
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > > wrote:
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > > Hi Pavel,
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > Thanks for starting this thread! Can I ask
> > some
> > > > > > >> questions
> > > > > > >> > > here
> > > > > > >> > > > to
> > > > > > >> > > > > > get
> > > > > > >> > > > > > > > the
> > > > > > >> > > > > > > > > > feature more clearly?
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > As I understand it correctly, half-state is
> a
> > > > > possible
> > > > > > >> > > > situation
> > > > > > >> > > > > > when
> > > > > > >> > > > > > > > an
> > > > > > >> > > > > > > > > > Ignite node goes down or somehow removes
> > > > connection
> > > > > > to a
> > > > > > >> > thin
> > > > > > >> > > > > > client.
> > > > > > >> > > > > > > > But
> > > > > > >> > > > > > > > > > with enabled (true by default)
> > > partitionAwareness
> > > > > > >> feature
> > > > > > >> > > > clients
> > > > > > >> > > > > > can
> > > > > > >> > > > > > > > be
> > > > > > >> > > > > > > > > > notified about topology changes. So, there
> are
> > > > > > possible
> > > > > > >> > > cases:
> > > > > > >> > > > > > > > > > 1. ThinClient connects to a single node.
> > > > > > >> > > > > > > > > > 2. Ignite node removes connection from
> itself.
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > I like the idea for the case with a single
> > node,
> > > > as
> > > > > it
> > > > > > >> > helps
> > > > > > >> > > > fail
> > > > > > >> > > > > > > fast.
> > > > > > >> > > > > > > > > > But is it OK to connect a client to a single
> > > node
> > > > > > only?
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > For the second one: you mention that a case
> > for
> > > > the
> > > > > > >> second
> > > > > > >> > > > option
> > > > > > >> > > > > > is
> > > > > > >> > > > > > > > > > "Long-living and mostly idle connections are
> > > > > > especially
> > > > > > >> > > > > susceptible
> > > > > > >> > > > > > > to
> > > > > > >> > > > > > > > > this
> > > > > > >> > > > > > > > > > behavior". If I understand correctly the
> > > > connections
> > > > > > are
> > > > > > >> > > > removed
> > > > > > >> > > > > on
> > > > > > >> > > > > > > the
> > > > > > >> > > > > > > > > > server side by client idle timeout. Can we
> > just
> > > > > > >> configure
> > > > > > >> > the
> > > > > > >> > > > > idle
> > > > > > >> > > > > > > > > timeout
> > > > > > >> > > > > > > > > > for cases where we really need keeping alive
> > > idle
> > > > > > >> > > connections?
> > > > > > >> > > > > Are
> > > > > > >> > > > > > > > there
> > > > > > >> > > > > > > > > > any other cases with unexpectedly dropped
> > > > > connections?
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > I'm wondering is it OK to keep such
> > connections
> > > > > alive
> > > > > > >> for a
> > > > > > >> > > > long
> > > > > > >> > > > > > > time?
> > > > > > >> > > > > > > > > > Also in the case of partition awareness
> > features
> > > > it
> > > > > > can
> > > > > > >> > lead
> > > > > > >> > > to
> > > > > > >> > > > > > > wasting
> > > > > > >> > > > > > > > > TCP
> > > > > > >> > > > > > > > > > sockets on Ignite nodes, can't it?
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > Thanks!
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > On Thu, Feb 3, 2022 at 2:24 PM Pavel
> Tupitsyn
> > <
> > > > > > >> > > > > > ptupitsyn@apache.org>
> > > > > > >> > > > > > > > > > wrote:
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > >> Igniters,
> > > > > > >> > > > > > > > > >>
> > > > > > >> > > > > > > > > >> Please review the proposal to add heartbeat
> > > > > messages
> > > > > > to
> > > > > > >> > the
> > > > > > >> > > > thin
> > > > > > >> > > > > > > > client
> > > > > > >> > > > > > > > > >> protocol (both 2.x and 3.x) and let me know
> > > your
> > > > > > >> thoughts:
> > > > > > >> > > > > > > > > >>
> > > > > > >> > > > > > > > > >>
> > > > > > >> > > > > > > > > >>
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
> > > > > > >> > > > > > > > > >>
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > --
> > > > > > >> > > > > > > > Sincerely yours, Ivan Daschinskiy
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > > > --
> > > > > > >> > > > > > Sincerely yours, Ivan Daschinskiy
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> Sincerely yours, Ivan Daschinskiy
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Sincerely yours, Ivan Daschinskiy
> > >
> >
>
>
> --
> Sincerely yours, Ivan Daschinskiy
>

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Ivan Daschinsky <iv...@gmail.com>.
I see potential in this feature, especially if we use something like
continuous query. Stale clients can consume a lot of resources and it is
worth kick these clients out.

пн, 7 февр. 2022 г. в 18:25, Pavel Tupitsyn <pt...@apache.org>:

> > If we use new approach, we can reduce this timeout. But this can affect
> old clients.
>
> idleTimeout is disabled by default, we are not going to change this.
>
> > Also, let's think about that sending heartbeats and interval of sending
> > heartbeats could be calculated on the server side (i.e. one third of idle
> > timeout) and sent to the client during handshake.
> > Also we can introduce something like a negotiation mechanism as in
> > zookeeper.
>
> I tend to agree with Maksim here, let's keep it simple and explicit.
> Log a warning, but don't do anything clever.
>
> On Mon, Feb 7, 2022 at 6:15 PM Ivan Daschinsky <iv...@gmail.com>
> wrote:
>
> > >> idleTimeout already exists, I don't think we should change the way it
> > works (or did I misunderstand you?)
> > If we use new approach, we can reduce this timeout. But this can affect
> old
> > clients.
> >
> >
> > Also, let's think about that sending heartbeats and interval of sending
> > heartbeats could be calculated on the server side (i.e. one third of idle
> > timeout) and sent to the client
> > during handshake.
> > Also we can introduce something like a negotiation mechanism as in
> > zookeeper.
> >
> >
> > пн, 7 февр. 2022 г. в 18:05, Pavel Tupitsyn <pt...@apache.org>:
> >
> > > Igor,
> > >
> > > > Maybe clients should pass this information on to the handshake.
> > >
> > > Do you think we should log a mismatched timeout warning on the server,
> > not
> > > on the client?
> > > Or should we do both?
> > >
> > >
> > > I've updated the proposal with OP_GET_IDLE_TIMEOUT and some other
> details
> > > discussed above.
> > >
> > > On Mon, Feb 7, 2022 at 5:42 PM Igor Sapego <is...@apache.org> wrote:
> > >
> > > > Feature seems useful for me as it makes connection management more
> > robust
> > > > and
> > > > predictable.
> > > >
> > > > I agree with Pavel, that we should print warning when heartbeat
> period
> > is
> > > > larger than
> > > > idle timeout, but I see a problem here as idle timeout is configured
> on
> > > > server and is not
> > > > known to clients, while heartbeats configured on clients and their
> > period
> > > > is not known
> > > > to the server. Maybe clients should pass this information on to the
> > > > handshake.
> > > >
> > > > Regarding Python and PHP clients - can not we use some kind of timers
> > to
> > > > implement
> > > > this feature?
> > > >
> > > > Best Regards,
> > > > Igor
> > > >
> > > >
> > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <pt...@apache.org>
> > > > wrote:
> > > >
> > > > > Maksim, agree. Let's not be too clever and only log a warning.
> > > > >
> > > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <
> ptupitsyn@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Ivan, idleTimeout already exists, I don't think we should change
> > the
> > > > way
> > > > > > it works (or did I misunderstand you?)
> > > > > >
> > > > > > Of course, enabling heartbeats means that otherwise idle clients
> > will
> > > > no
> > > > > > longer be disconnected by the server.
> > > > > > I think we should cross-link those properties in the
> documentation
> > > and
> > > > > > explain this behavior.
> > > > > >
> > > > > > On Mon, Feb 7, 2022 at 4:39 PM Ivan Daschinsky <
> > ivandasch@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> >>3. Already implemented: when
> > > > ClientConnectorConfiguration#idleTimeout
> > > > > is
> > > > > >> not zero, server disconnects idle clients
> > > > > >> >>
> > > > > >> But I suppose it would be great to have:
> > > > > >> 1. If client supports keep alive, use idleTimeout
> > > > > >> 2. If not, do not use it.
> > > > > >>
> > > > > >> But I am not sure if it is correct or not.
> > > > > >>
> > > > > >> пн, 7 февр. 2022 г. в 16:01, Maksim Timonin <
> > > timoninmaxim@apache.org
> > > > >:
> > > > > >>
> > > > > >> > I believe explicit is better than implicit :) Also in case of
> > > > dynamic
> > > > > >> > calculation of timeout, it can change dynamically, for example
> > > > > >> restarting a
> > > > > >> > cluster with different configuration should reconfigure
> clients
> > > too.
> > > > > >> Looks
> > > > > >> > complicated.
> > > > > >> >
> > > > > >> > My vote for WARN + javadocs with mention of this issue.
> > > > > >> >
> > > > > >> > On Mon, Feb 7, 2022 at 3:51 PM Pavel Tupitsyn <
> > > ptupitsyn@apache.org
> > > > >
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> > > > WDYT, should we add a WARN message for clients that
> > configure
> > > > > >> > > > keepAliveTimeout greater than idleTimeout on the server
> > side?
> > > > > >> > >
> > > > > >> > > I think we should either log a WARN, or retrieve idleTimeout
> > > from
> > > > > >> server
> > > > > >> > > and configure heartbeatTimeout accordingly (e.g. divide by
> 2).
> > > > > >> > > Thoughts?
> > > > > >> > >
> > > > > >> > > On Mon, Feb 7, 2022 at 3:14 PM Maksim Timonin <
> > > > > >> timoninmaxim@apache.org>
> > > > > >> > > wrote:
> > > > > >> > >
> > > > > >> > > > Hi Pavel,
> > > > > >> > > >
> > > > > >> > > > Thanks for the links. Yes, I forgot that the flag of
> changed
> > > > > >> topology
> > > > > >> > is
> > > > > >> > > > lazy. Also I missed that the keepAlive setting is
> configured
> > > on
> > > > > the
> > > > > >> > > client
> > > > > >> > > > side (alternatively to idleTimeout that is on the server
> > > side).
> > > > > >> > > >
> > > > > >> > > > Now I understand, this feature can be helpful then. Every
> > > client
> > > > > can
> > > > > >> > > > configure itself in case it's possible to be idle
> sometimes,
> > > and
> > > > > >> choose
> > > > > >> > > > an appropriate timeout by itself too. And by default the
> > > feature
> > > > > >> should
> > > > > >> > > be
> > > > > >> > > > disabled.
> > > > > >> > > >
> > > > > >> > > > WDYT, should we add a WARN message for clients that
> > configure
> > > > > >> > > > keepAliveTimeout greater than idleTimeout on the server
> > side?
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > On Mon, Feb 7, 2022 at 1:05 PM Pavel Tupitsyn <
> > > > > ptupitsyn@apache.org
> > > > > >> >
> > > > > >> > > > wrote:
> > > > > >> > > >
> > > > > >> > > > > Ivan,
> > > > > >> > > > >
> > > > > >> > > > > I suggest the following:
> > > > > >> > > > >
> > > > > >> > > > > 1. Server sends KEEP_ALIVE feature flag, which means it
> > > > accepts
> > > > > >> > > > > OP_KEEP_ALIVE empty message
> > > > > >> > > > > 2. Client sends OP_KEEP_ALIVE when the connection is
> idle
> > > for
> > > > a
> > > > > >> > > > > certain period of time
> > > > > >> > > > > 3. Already implemented: when
> > > > > >> ClientConnectorConfiguration#idleTimeout
> > > > > >> > > is
> > > > > >> > > > > not zero, server disconnects idle clients
> > > > > >> > > > >
> > > > > >> > > > > This way we don't need server->client keepalives, as you
> > > > > correctly
> > > > > >> > > noted.
> > > > > >> > > > >
> > > > > >> > > > > On Mon, Feb 7, 2022 at 12:43 PM Ivan Daschinsky <
> > > > > >> ivandasch@gmail.com
> > > > > >> > >
> > > > > >> > > > > wrote:
> > > > > >> > > > >
> > > > > >> > > > > > Pavel, I suppose that ideally:
> > > > > >> > > > > > 1. Client send in handshake flag, that it supports
> > > > KEEP_ALIVE
> > > > > >> > feature
> > > > > >> > > > and
> > > > > >> > > > > > server takes it into account.
> > > > > >> > > > > > 2. Each request of client can be considered as
> > keep-alive
> > > > > ping.
> > > > > >> > > > > > 3. Client send failure should be processed using retry
> > > > policy.
> > > > > >> > > > > > 4. Server should not send keep-alive packets, it is
> > > > redundant,
> > > > > >> but
> > > > > >> > > > server
> > > > > >> > > > > > should track requests from client and if there is no
> > > > requests
> > > > > >> from
> > > > > >> > > > client
> > > > > >> > > > > > with KEEP_ALIVE feature,
> > > > > >> > > > > > automatically close connection and free resources.
> > > > > >> > > > > >
> > > > > >> > > > > > Similar approach is used in zookeeper clients.
> > > > > >> > > > > >
> > > > > >> > > > > > пн, 7 февр. 2022 г. в 12:24, Pavel Tupitsyn <
> > > > > >> ptupitsyn@apache.org
> > > > > >> > >:
> > > > > >> > > > > >
> > > > > >> > > > > > > Ivan,
> > > > > >> > > > > > >
> > > > > >> > > > > > > Ideally, the check should come from both sides.
> > > > > >> > > > > > > - Client periodically sends keepalive to server
> > > > > >> > > > > > > - Server periodically sends keepalive to client
> > > > > >> > > > > > >
> > > > > >> > > > > > > Feature flags will be added accordingly, so it is
> not
> > > > > >> necessary
> > > > > >> > to
> > > > > >> > > > > > > implement this in all thin clients.
> > > > > >> > > > > > >
> > > > > >> > > > > > > On Mon, Feb 7, 2022 at 11:43 AM Ivan Daschinsky <
> > > > > >> > > ivandasch@gmail.com
> > > > > >> > > > >
> > > > > >> > > > > > > wrote:
> > > > > >> > > > > > >
> > > > > >> > > > > > > > I suppose it is great idea, but this functionality
> > can
> > > > be
> > > > > >> hard
> > > > > >> > to
> > > > > >> > > > > > > implement
> > > > > >> > > > > > > > for some platforms. I.e. sync python client or php
> > > > (there
> > > > > >> is no
> > > > > >> > > > real
> > > > > >> > > > > > > > multithreading for python (GIL) and php is single
> > > > threaded
> > > > > >> by
> > > > > >> > > > > design).
> > > > > >> > > > > > > But
> > > > > >> > > > > > > > for async clients it is not very hard to
> implement.
> > > > > >> > Nevertheless,
> > > > > >> > > > > this
> > > > > >> > > > > > > > feature should be optional, because of possible
> > > > technical
> > > > > >> > > > > limitations.
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > Pavel, is this check mostly for client side? Or
> > > servers
> > > > > can
> > > > > >> do
> > > > > >> > > some
> > > > > >> > > > > > > actions
> > > > > >> > > > > > > > if there is no activity from thin client (i.e.
> > closing
> > > > > >> context
> > > > > >> > > and
> > > > > >> > > > > free
> > > > > >> > > > > > > > resources such as queries' handles and so on?)
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > пн, 7 февр. 2022 г. в 11:09, Pavel Tupitsyn <
> > > > > >> > > ptupitsyn@apache.org
> > > > > >> > > > >:
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > > Hi Maksim,
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > half-state is a possible situation when an
> > Ignite
> > > > node
> > > > > >> goes
> > > > > >> > > > down
> > > > > >> > > > > or
> > > > > >> > > > > > > > > somehow removes connection to a thin client
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Half-open state is also possible when, for
> > example,
> > > an
> > > > > >> > > > intermediate
> > > > > >> > > > > > > > router
> > > > > >> > > > > > > > > is rebooted [1].
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > This is what we seem to have encountered with
> one
> > of
> > > > our
> > > > > >> > > > customers
> > > > > >> > > > > -
> > > > > >> > > > > > > they
> > > > > >> > > > > > > > > have a stable cluster, and long-living (multiple
> > > days)
> > > > > >> thin
> > > > > >> > > > client
> > > > > >> > > > > > > > > connections which can be idle for some time.
> > > > > >> > > > > > > > > And only when we send some data on such an idle
> > > > > >> connection do
> > > > > >> > > we
> > > > > >> > > > > > > discover
> > > > > >> > > > > > > > > that it is broken.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > But with enabled (true by default)
> > > > partitionAwareness
> > > > > >> > feature
> > > > > >> > > > > > clients
> > > > > >> > > > > > > > can
> > > > > >> > > > > > > > > be notified about topology changes
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Partition awareness is a "lazy" notification in
> a
> > > form
> > > > > of
> > > > > >> a
> > > > > >> > > > > response
> > > > > >> > > > > > > > > message flag [2].
> > > > > >> > > > > > > > > You won't get one on an idle connection.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > the connections are removed on the server side
> > by
> > > > > client
> > > > > >> > idle
> > > > > >> > > > > > timeout
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Idle timeout is disabled by default.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > is it OK to keep such connections alive for a
> > long
> > > > > time
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > I think it is up to the user.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > in the case of partition awareness features it
> > can
> > > > > lead
> > > > > >> to
> > > > > >> > > > > wasting
> > > > > >> > > > > > > TCP
> > > > > >> > > > > > > > > sockets on Ignite nodes, can't it
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Can you please elaborate?
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > [1]
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
> > > > > >> > > > > > > > > [2]
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > On Fri, Feb 4, 2022 at 4:01 PM Maksim Timonin <
> > > > > >> > > > > > timoninmaxim@apache.org
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > > wrote:
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > Hi Pavel,
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > Thanks for starting this thread! Can I ask
> some
> > > > > >> questions
> > > > > >> > > here
> > > > > >> > > > to
> > > > > >> > > > > > get
> > > > > >> > > > > > > > the
> > > > > >> > > > > > > > > > feature more clearly?
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > As I understand it correctly, half-state is a
> > > > possible
> > > > > >> > > > situation
> > > > > >> > > > > > when
> > > > > >> > > > > > > > an
> > > > > >> > > > > > > > > > Ignite node goes down or somehow removes
> > > connection
> > > > > to a
> > > > > >> > thin
> > > > > >> > > > > > client.
> > > > > >> > > > > > > > But
> > > > > >> > > > > > > > > > with enabled (true by default)
> > partitionAwareness
> > > > > >> feature
> > > > > >> > > > clients
> > > > > >> > > > > > can
> > > > > >> > > > > > > > be
> > > > > >> > > > > > > > > > notified about topology changes. So, there are
> > > > > possible
> > > > > >> > > cases:
> > > > > >> > > > > > > > > > 1. ThinClient connects to a single node.
> > > > > >> > > > > > > > > > 2. Ignite node removes connection from itself.
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > I like the idea for the case with a single
> node,
> > > as
> > > > it
> > > > > >> > helps
> > > > > >> > > > fail
> > > > > >> > > > > > > fast.
> > > > > >> > > > > > > > > > But is it OK to connect a client to a single
> > node
> > > > > only?
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > For the second one: you mention that a case
> for
> > > the
> > > > > >> second
> > > > > >> > > > option
> > > > > >> > > > > > is
> > > > > >> > > > > > > > > > "Long-living and mostly idle connections are
> > > > > especially
> > > > > >> > > > > susceptible
> > > > > >> > > > > > > to
> > > > > >> > > > > > > > > this
> > > > > >> > > > > > > > > > behavior". If I understand correctly the
> > > connections
> > > > > are
> > > > > >> > > > removed
> > > > > >> > > > > on
> > > > > >> > > > > > > the
> > > > > >> > > > > > > > > > server side by client idle timeout. Can we
> just
> > > > > >> configure
> > > > > >> > the
> > > > > >> > > > > idle
> > > > > >> > > > > > > > > timeout
> > > > > >> > > > > > > > > > for cases where we really need keeping alive
> > idle
> > > > > >> > > connections?
> > > > > >> > > > > Are
> > > > > >> > > > > > > > there
> > > > > >> > > > > > > > > > any other cases with unexpectedly dropped
> > > > connections?
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > I'm wondering is it OK to keep such
> connections
> > > > alive
> > > > > >> for a
> > > > > >> > > > long
> > > > > >> > > > > > > time?
> > > > > >> > > > > > > > > > Also in the case of partition awareness
> features
> > > it
> > > > > can
> > > > > >> > lead
> > > > > >> > > to
> > > > > >> > > > > > > wasting
> > > > > >> > > > > > > > > TCP
> > > > > >> > > > > > > > > > sockets on Ignite nodes, can't it?
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > Thanks!
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > On Thu, Feb 3, 2022 at 2:24 PM Pavel Tupitsyn
> <
> > > > > >> > > > > > ptupitsyn@apache.org>
> > > > > >> > > > > > > > > > wrote:
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > >> Igniters,
> > > > > >> > > > > > > > > >>
> > > > > >> > > > > > > > > >> Please review the proposal to add heartbeat
> > > > messages
> > > > > to
> > > > > >> > the
> > > > > >> > > > thin
> > > > > >> > > > > > > > client
> > > > > >> > > > > > > > > >> protocol (both 2.x and 3.x) and let me know
> > your
> > > > > >> thoughts:
> > > > > >> > > > > > > > > >>
> > > > > >> > > > > > > > > >>
> > > > > >> > > > > > > > > >>
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
> > > > > >> > > > > > > > > >>
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > --
> > > > > >> > > > > > > > Sincerely yours, Ivan Daschinskiy
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > >
> > > > > >> > > > > > --
> > > > > >> > > > > > Sincerely yours, Ivan Daschinskiy
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> Sincerely yours, Ivan Daschinskiy
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Sincerely yours, Ivan Daschinskiy
> >
>


-- 
Sincerely yours, Ivan Daschinskiy

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Pavel Tupitsyn <pt...@apache.org>.
> If we use new approach, we can reduce this timeout. But this can affect
old clients.

idleTimeout is disabled by default, we are not going to change this.

> Also, let's think about that sending heartbeats and interval of sending
> heartbeats could be calculated on the server side (i.e. one third of idle
> timeout) and sent to the client during handshake.
> Also we can introduce something like a negotiation mechanism as in
> zookeeper.

I tend to agree with Maksim here, let's keep it simple and explicit.
Log a warning, but don't do anything clever.

On Mon, Feb 7, 2022 at 6:15 PM Ivan Daschinsky <iv...@gmail.com> wrote:

> >> idleTimeout already exists, I don't think we should change the way it
> works (or did I misunderstand you?)
> If we use new approach, we can reduce this timeout. But this can affect old
> clients.
>
>
> Also, let's think about that sending heartbeats and interval of sending
> heartbeats could be calculated on the server side (i.e. one third of idle
> timeout) and sent to the client
> during handshake.
> Also we can introduce something like a negotiation mechanism as in
> zookeeper.
>
>
> пн, 7 февр. 2022 г. в 18:05, Pavel Tupitsyn <pt...@apache.org>:
>
> > Igor,
> >
> > > Maybe clients should pass this information on to the handshake.
> >
> > Do you think we should log a mismatched timeout warning on the server,
> not
> > on the client?
> > Or should we do both?
> >
> >
> > I've updated the proposal with OP_GET_IDLE_TIMEOUT and some other details
> > discussed above.
> >
> > On Mon, Feb 7, 2022 at 5:42 PM Igor Sapego <is...@apache.org> wrote:
> >
> > > Feature seems useful for me as it makes connection management more
> robust
> > > and
> > > predictable.
> > >
> > > I agree with Pavel, that we should print warning when heartbeat period
> is
> > > larger than
> > > idle timeout, but I see a problem here as idle timeout is configured on
> > > server and is not
> > > known to clients, while heartbeats configured on clients and their
> period
> > > is not known
> > > to the server. Maybe clients should pass this information on to the
> > > handshake.
> > >
> > > Regarding Python and PHP clients - can not we use some kind of timers
> to
> > > implement
> > > this feature?
> > >
> > > Best Regards,
> > > Igor
> > >
> > >
> > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <pt...@apache.org>
> > > wrote:
> > >
> > > > Maksim, agree. Let's not be too clever and only log a warning.
> > > >
> > > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <pt...@apache.org>
> > > > wrote:
> > > >
> > > > > Ivan, idleTimeout already exists, I don't think we should change
> the
> > > way
> > > > > it works (or did I misunderstand you?)
> > > > >
> > > > > Of course, enabling heartbeats means that otherwise idle clients
> will
> > > no
> > > > > longer be disconnected by the server.
> > > > > I think we should cross-link those properties in the documentation
> > and
> > > > > explain this behavior.
> > > > >
> > > > > On Mon, Feb 7, 2022 at 4:39 PM Ivan Daschinsky <
> ivandasch@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> >>3. Already implemented: when
> > > ClientConnectorConfiguration#idleTimeout
> > > > is
> > > > >> not zero, server disconnects idle clients
> > > > >> >>
> > > > >> But I suppose it would be great to have:
> > > > >> 1. If client supports keep alive, use idleTimeout
> > > > >> 2. If not, do not use it.
> > > > >>
> > > > >> But I am not sure if it is correct or not.
> > > > >>
> > > > >> пн, 7 февр. 2022 г. в 16:01, Maksim Timonin <
> > timoninmaxim@apache.org
> > > >:
> > > > >>
> > > > >> > I believe explicit is better than implicit :) Also in case of
> > > dynamic
> > > > >> > calculation of timeout, it can change dynamically, for example
> > > > >> restarting a
> > > > >> > cluster with different configuration should reconfigure clients
> > too.
> > > > >> Looks
> > > > >> > complicated.
> > > > >> >
> > > > >> > My vote for WARN + javadocs with mention of this issue.
> > > > >> >
> > > > >> > On Mon, Feb 7, 2022 at 3:51 PM Pavel Tupitsyn <
> > ptupitsyn@apache.org
> > > >
> > > > >> > wrote:
> > > > >> >
> > > > >> > > > WDYT, should we add a WARN message for clients that
> configure
> > > > >> > > > keepAliveTimeout greater than idleTimeout on the server
> side?
> > > > >> > >
> > > > >> > > I think we should either log a WARN, or retrieve idleTimeout
> > from
> > > > >> server
> > > > >> > > and configure heartbeatTimeout accordingly (e.g. divide by 2).
> > > > >> > > Thoughts?
> > > > >> > >
> > > > >> > > On Mon, Feb 7, 2022 at 3:14 PM Maksim Timonin <
> > > > >> timoninmaxim@apache.org>
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > > Hi Pavel,
> > > > >> > > >
> > > > >> > > > Thanks for the links. Yes, I forgot that the flag of changed
> > > > >> topology
> > > > >> > is
> > > > >> > > > lazy. Also I missed that the keepAlive setting is configured
> > on
> > > > the
> > > > >> > > client
> > > > >> > > > side (alternatively to idleTimeout that is on the server
> > side).
> > > > >> > > >
> > > > >> > > > Now I understand, this feature can be helpful then. Every
> > client
> > > > can
> > > > >> > > > configure itself in case it's possible to be idle sometimes,
> > and
> > > > >> choose
> > > > >> > > > an appropriate timeout by itself too. And by default the
> > feature
> > > > >> should
> > > > >> > > be
> > > > >> > > > disabled.
> > > > >> > > >
> > > > >> > > > WDYT, should we add a WARN message for clients that
> configure
> > > > >> > > > keepAliveTimeout greater than idleTimeout on the server
> side?
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On Mon, Feb 7, 2022 at 1:05 PM Pavel Tupitsyn <
> > > > ptupitsyn@apache.org
> > > > >> >
> > > > >> > > > wrote:
> > > > >> > > >
> > > > >> > > > > Ivan,
> > > > >> > > > >
> > > > >> > > > > I suggest the following:
> > > > >> > > > >
> > > > >> > > > > 1. Server sends KEEP_ALIVE feature flag, which means it
> > > accepts
> > > > >> > > > > OP_KEEP_ALIVE empty message
> > > > >> > > > > 2. Client sends OP_KEEP_ALIVE when the connection is idle
> > for
> > > a
> > > > >> > > > > certain period of time
> > > > >> > > > > 3. Already implemented: when
> > > > >> ClientConnectorConfiguration#idleTimeout
> > > > >> > > is
> > > > >> > > > > not zero, server disconnects idle clients
> > > > >> > > > >
> > > > >> > > > > This way we don't need server->client keepalives, as you
> > > > correctly
> > > > >> > > noted.
> > > > >> > > > >
> > > > >> > > > > On Mon, Feb 7, 2022 at 12:43 PM Ivan Daschinsky <
> > > > >> ivandasch@gmail.com
> > > > >> > >
> > > > >> > > > > wrote:
> > > > >> > > > >
> > > > >> > > > > > Pavel, I suppose that ideally:
> > > > >> > > > > > 1. Client send in handshake flag, that it supports
> > > KEEP_ALIVE
> > > > >> > feature
> > > > >> > > > and
> > > > >> > > > > > server takes it into account.
> > > > >> > > > > > 2. Each request of client can be considered as
> keep-alive
> > > > ping.
> > > > >> > > > > > 3. Client send failure should be processed using retry
> > > policy.
> > > > >> > > > > > 4. Server should not send keep-alive packets, it is
> > > redundant,
> > > > >> but
> > > > >> > > > server
> > > > >> > > > > > should track requests from client and if there is no
> > > requests
> > > > >> from
> > > > >> > > > client
> > > > >> > > > > > with KEEP_ALIVE feature,
> > > > >> > > > > > automatically close connection and free resources.
> > > > >> > > > > >
> > > > >> > > > > > Similar approach is used in zookeeper clients.
> > > > >> > > > > >
> > > > >> > > > > > пн, 7 февр. 2022 г. в 12:24, Pavel Tupitsyn <
> > > > >> ptupitsyn@apache.org
> > > > >> > >:
> > > > >> > > > > >
> > > > >> > > > > > > Ivan,
> > > > >> > > > > > >
> > > > >> > > > > > > Ideally, the check should come from both sides.
> > > > >> > > > > > > - Client periodically sends keepalive to server
> > > > >> > > > > > > - Server periodically sends keepalive to client
> > > > >> > > > > > >
> > > > >> > > > > > > Feature flags will be added accordingly, so it is not
> > > > >> necessary
> > > > >> > to
> > > > >> > > > > > > implement this in all thin clients.
> > > > >> > > > > > >
> > > > >> > > > > > > On Mon, Feb 7, 2022 at 11:43 AM Ivan Daschinsky <
> > > > >> > > ivandasch@gmail.com
> > > > >> > > > >
> > > > >> > > > > > > wrote:
> > > > >> > > > > > >
> > > > >> > > > > > > > I suppose it is great idea, but this functionality
> can
> > > be
> > > > >> hard
> > > > >> > to
> > > > >> > > > > > > implement
> > > > >> > > > > > > > for some platforms. I.e. sync python client or php
> > > (there
> > > > >> is no
> > > > >> > > > real
> > > > >> > > > > > > > multithreading for python (GIL) and php is single
> > > threaded
> > > > >> by
> > > > >> > > > > design).
> > > > >> > > > > > > But
> > > > >> > > > > > > > for async clients it is not very hard to implement.
> > > > >> > Nevertheless,
> > > > >> > > > > this
> > > > >> > > > > > > > feature should be optional, because of possible
> > > technical
> > > > >> > > > > limitations.
> > > > >> > > > > > > >
> > > > >> > > > > > > > Pavel, is this check mostly for client side? Or
> > servers
> > > > can
> > > > >> do
> > > > >> > > some
> > > > >> > > > > > > actions
> > > > >> > > > > > > > if there is no activity from thin client (i.e.
> closing
> > > > >> context
> > > > >> > > and
> > > > >> > > > > free
> > > > >> > > > > > > > resources such as queries' handles and so on?)
> > > > >> > > > > > > >
> > > > >> > > > > > > > пн, 7 февр. 2022 г. в 11:09, Pavel Tupitsyn <
> > > > >> > > ptupitsyn@apache.org
> > > > >> > > > >:
> > > > >> > > > > > > >
> > > > >> > > > > > > > > Hi Maksim,
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > half-state is a possible situation when an
> Ignite
> > > node
> > > > >> goes
> > > > >> > > > down
> > > > >> > > > > or
> > > > >> > > > > > > > > somehow removes connection to a thin client
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Half-open state is also possible when, for
> example,
> > an
> > > > >> > > > intermediate
> > > > >> > > > > > > > router
> > > > >> > > > > > > > > is rebooted [1].
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > This is what we seem to have encountered with one
> of
> > > our
> > > > >> > > > customers
> > > > >> > > > > -
> > > > >> > > > > > > they
> > > > >> > > > > > > > > have a stable cluster, and long-living (multiple
> > days)
> > > > >> thin
> > > > >> > > > client
> > > > >> > > > > > > > > connections which can be idle for some time.
> > > > >> > > > > > > > > And only when we send some data on such an idle
> > > > >> connection do
> > > > >> > > we
> > > > >> > > > > > > discover
> > > > >> > > > > > > > > that it is broken.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > But with enabled (true by default)
> > > partitionAwareness
> > > > >> > feature
> > > > >> > > > > > clients
> > > > >> > > > > > > > can
> > > > >> > > > > > > > > be notified about topology changes
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Partition awareness is a "lazy" notification in a
> > form
> > > > of
> > > > >> a
> > > > >> > > > > response
> > > > >> > > > > > > > > message flag [2].
> > > > >> > > > > > > > > You won't get one on an idle connection.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > the connections are removed on the server side
> by
> > > > client
> > > > >> > idle
> > > > >> > > > > > timeout
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Idle timeout is disabled by default.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > is it OK to keep such connections alive for a
> long
> > > > time
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > I think it is up to the user.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > in the case of partition awareness features it
> can
> > > > lead
> > > > >> to
> > > > >> > > > > wasting
> > > > >> > > > > > > TCP
> > > > >> > > > > > > > > sockets on Ignite nodes, can't it
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Can you please elaborate?
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > [1]
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
> > > > >> > > > > > > > > [2]
> > > > >> > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > On Fri, Feb 4, 2022 at 4:01 PM Maksim Timonin <
> > > > >> > > > > > timoninmaxim@apache.org
> > > > >> > > > > > > >
> > > > >> > > > > > > > > wrote:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > Hi Pavel,
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > Thanks for starting this thread! Can I ask some
> > > > >> questions
> > > > >> > > here
> > > > >> > > > to
> > > > >> > > > > > get
> > > > >> > > > > > > > the
> > > > >> > > > > > > > > > feature more clearly?
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > As I understand it correctly, half-state is a
> > > possible
> > > > >> > > > situation
> > > > >> > > > > > when
> > > > >> > > > > > > > an
> > > > >> > > > > > > > > > Ignite node goes down or somehow removes
> > connection
> > > > to a
> > > > >> > thin
> > > > >> > > > > > client.
> > > > >> > > > > > > > But
> > > > >> > > > > > > > > > with enabled (true by default)
> partitionAwareness
> > > > >> feature
> > > > >> > > > clients
> > > > >> > > > > > can
> > > > >> > > > > > > > be
> > > > >> > > > > > > > > > notified about topology changes. So, there are
> > > > possible
> > > > >> > > cases:
> > > > >> > > > > > > > > > 1. ThinClient connects to a single node.
> > > > >> > > > > > > > > > 2. Ignite node removes connection from itself.
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > I like the idea for the case with a single node,
> > as
> > > it
> > > > >> > helps
> > > > >> > > > fail
> > > > >> > > > > > > fast.
> > > > >> > > > > > > > > > But is it OK to connect a client to a single
> node
> > > > only?
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > For the second one: you mention that a case for
> > the
> > > > >> second
> > > > >> > > > option
> > > > >> > > > > > is
> > > > >> > > > > > > > > > "Long-living and mostly idle connections are
> > > > especially
> > > > >> > > > > susceptible
> > > > >> > > > > > > to
> > > > >> > > > > > > > > this
> > > > >> > > > > > > > > > behavior". If I understand correctly the
> > connections
> > > > are
> > > > >> > > > removed
> > > > >> > > > > on
> > > > >> > > > > > > the
> > > > >> > > > > > > > > > server side by client idle timeout. Can we just
> > > > >> configure
> > > > >> > the
> > > > >> > > > > idle
> > > > >> > > > > > > > > timeout
> > > > >> > > > > > > > > > for cases where we really need keeping alive
> idle
> > > > >> > > connections?
> > > > >> > > > > Are
> > > > >> > > > > > > > there
> > > > >> > > > > > > > > > any other cases with unexpectedly dropped
> > > connections?
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > I'm wondering is it OK to keep such connections
> > > alive
> > > > >> for a
> > > > >> > > > long
> > > > >> > > > > > > time?
> > > > >> > > > > > > > > > Also in the case of partition awareness features
> > it
> > > > can
> > > > >> > lead
> > > > >> > > to
> > > > >> > > > > > > wasting
> > > > >> > > > > > > > > TCP
> > > > >> > > > > > > > > > sockets on Ignite nodes, can't it?
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > Thanks!
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > On Thu, Feb 3, 2022 at 2:24 PM Pavel Tupitsyn <
> > > > >> > > > > > ptupitsyn@apache.org>
> > > > >> > > > > > > > > > wrote:
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > >> Igniters,
> > > > >> > > > > > > > > >>
> > > > >> > > > > > > > > >> Please review the proposal to add heartbeat
> > > messages
> > > > to
> > > > >> > the
> > > > >> > > > thin
> > > > >> > > > > > > > client
> > > > >> > > > > > > > > >> protocol (both 2.x and 3.x) and let me know
> your
> > > > >> thoughts:
> > > > >> > > > > > > > > >>
> > > > >> > > > > > > > > >>
> > > > >> > > > > > > > > >>
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
> > > > >> > > > > > > > > >>
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > > > --
> > > > >> > > > > > > > Sincerely yours, Ivan Daschinskiy
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > > >
> > > > >> > > > > > --
> > > > >> > > > > > Sincerely yours, Ivan Daschinskiy
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Sincerely yours, Ivan Daschinskiy
> > > > >>
> > > > >
> > > >
> > >
> >
>
>
> --
> Sincerely yours, Ivan Daschinskiy
>

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Ivan Daschinsky <iv...@gmail.com>.
>> idleTimeout already exists, I don't think we should change the way it
works (or did I misunderstand you?)
If we use new approach, we can reduce this timeout. But this can affect old
clients.


Also, let's think about that sending heartbeats and interval of sending
heartbeats could be calculated on the server side (i.e. one third of idle
timeout) and sent to the client
during handshake.
Also we can introduce something like a negotiation mechanism as in
zookeeper.


пн, 7 февр. 2022 г. в 18:05, Pavel Tupitsyn <pt...@apache.org>:

> Igor,
>
> > Maybe clients should pass this information on to the handshake.
>
> Do you think we should log a mismatched timeout warning on the server, not
> on the client?
> Or should we do both?
>
>
> I've updated the proposal with OP_GET_IDLE_TIMEOUT and some other details
> discussed above.
>
> On Mon, Feb 7, 2022 at 5:42 PM Igor Sapego <is...@apache.org> wrote:
>
> > Feature seems useful for me as it makes connection management more robust
> > and
> > predictable.
> >
> > I agree with Pavel, that we should print warning when heartbeat period is
> > larger than
> > idle timeout, but I see a problem here as idle timeout is configured on
> > server and is not
> > known to clients, while heartbeats configured on clients and their period
> > is not known
> > to the server. Maybe clients should pass this information on to the
> > handshake.
> >
> > Regarding Python and PHP clients - can not we use some kind of timers to
> > implement
> > this feature?
> >
> > Best Regards,
> > Igor
> >
> >
> > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <pt...@apache.org>
> > wrote:
> >
> > > Maksim, agree. Let's not be too clever and only log a warning.
> > >
> > > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <pt...@apache.org>
> > > wrote:
> > >
> > > > Ivan, idleTimeout already exists, I don't think we should change the
> > way
> > > > it works (or did I misunderstand you?)
> > > >
> > > > Of course, enabling heartbeats means that otherwise idle clients will
> > no
> > > > longer be disconnected by the server.
> > > > I think we should cross-link those properties in the documentation
> and
> > > > explain this behavior.
> > > >
> > > > On Mon, Feb 7, 2022 at 4:39 PM Ivan Daschinsky <iv...@gmail.com>
> > > > wrote:
> > > >
> > > >> >>3. Already implemented: when
> > ClientConnectorConfiguration#idleTimeout
> > > is
> > > >> not zero, server disconnects idle clients
> > > >> >>
> > > >> But I suppose it would be great to have:
> > > >> 1. If client supports keep alive, use idleTimeout
> > > >> 2. If not, do not use it.
> > > >>
> > > >> But I am not sure if it is correct or not.
> > > >>
> > > >> пн, 7 февр. 2022 г. в 16:01, Maksim Timonin <
> timoninmaxim@apache.org
> > >:
> > > >>
> > > >> > I believe explicit is better than implicit :) Also in case of
> > dynamic
> > > >> > calculation of timeout, it can change dynamically, for example
> > > >> restarting a
> > > >> > cluster with different configuration should reconfigure clients
> too.
> > > >> Looks
> > > >> > complicated.
> > > >> >
> > > >> > My vote for WARN + javadocs with mention of this issue.
> > > >> >
> > > >> > On Mon, Feb 7, 2022 at 3:51 PM Pavel Tupitsyn <
> ptupitsyn@apache.org
> > >
> > > >> > wrote:
> > > >> >
> > > >> > > > WDYT, should we add a WARN message for clients that configure
> > > >> > > > keepAliveTimeout greater than idleTimeout on the server side?
> > > >> > >
> > > >> > > I think we should either log a WARN, or retrieve idleTimeout
> from
> > > >> server
> > > >> > > and configure heartbeatTimeout accordingly (e.g. divide by 2).
> > > >> > > Thoughts?
> > > >> > >
> > > >> > > On Mon, Feb 7, 2022 at 3:14 PM Maksim Timonin <
> > > >> timoninmaxim@apache.org>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Hi Pavel,
> > > >> > > >
> > > >> > > > Thanks for the links. Yes, I forgot that the flag of changed
> > > >> topology
> > > >> > is
> > > >> > > > lazy. Also I missed that the keepAlive setting is configured
> on
> > > the
> > > >> > > client
> > > >> > > > side (alternatively to idleTimeout that is on the server
> side).
> > > >> > > >
> > > >> > > > Now I understand, this feature can be helpful then. Every
> client
> > > can
> > > >> > > > configure itself in case it's possible to be idle sometimes,
> and
> > > >> choose
> > > >> > > > an appropriate timeout by itself too. And by default the
> feature
> > > >> should
> > > >> > > be
> > > >> > > > disabled.
> > > >> > > >
> > > >> > > > WDYT, should we add a WARN message for clients that configure
> > > >> > > > keepAliveTimeout greater than idleTimeout on the server side?
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > On Mon, Feb 7, 2022 at 1:05 PM Pavel Tupitsyn <
> > > ptupitsyn@apache.org
> > > >> >
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > > > Ivan,
> > > >> > > > >
> > > >> > > > > I suggest the following:
> > > >> > > > >
> > > >> > > > > 1. Server sends KEEP_ALIVE feature flag, which means it
> > accepts
> > > >> > > > > OP_KEEP_ALIVE empty message
> > > >> > > > > 2. Client sends OP_KEEP_ALIVE when the connection is idle
> for
> > a
> > > >> > > > > certain period of time
> > > >> > > > > 3. Already implemented: when
> > > >> ClientConnectorConfiguration#idleTimeout
> > > >> > > is
> > > >> > > > > not zero, server disconnects idle clients
> > > >> > > > >
> > > >> > > > > This way we don't need server->client keepalives, as you
> > > correctly
> > > >> > > noted.
> > > >> > > > >
> > > >> > > > > On Mon, Feb 7, 2022 at 12:43 PM Ivan Daschinsky <
> > > >> ivandasch@gmail.com
> > > >> > >
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > Pavel, I suppose that ideally:
> > > >> > > > > > 1. Client send in handshake flag, that it supports
> > KEEP_ALIVE
> > > >> > feature
> > > >> > > > and
> > > >> > > > > > server takes it into account.
> > > >> > > > > > 2. Each request of client can be considered as keep-alive
> > > ping.
> > > >> > > > > > 3. Client send failure should be processed using retry
> > policy.
> > > >> > > > > > 4. Server should not send keep-alive packets, it is
> > redundant,
> > > >> but
> > > >> > > > server
> > > >> > > > > > should track requests from client and if there is no
> > requests
> > > >> from
> > > >> > > > client
> > > >> > > > > > with KEEP_ALIVE feature,
> > > >> > > > > > automatically close connection and free resources.
> > > >> > > > > >
> > > >> > > > > > Similar approach is used in zookeeper clients.
> > > >> > > > > >
> > > >> > > > > > пн, 7 февр. 2022 г. в 12:24, Pavel Tupitsyn <
> > > >> ptupitsyn@apache.org
> > > >> > >:
> > > >> > > > > >
> > > >> > > > > > > Ivan,
> > > >> > > > > > >
> > > >> > > > > > > Ideally, the check should come from both sides.
> > > >> > > > > > > - Client periodically sends keepalive to server
> > > >> > > > > > > - Server periodically sends keepalive to client
> > > >> > > > > > >
> > > >> > > > > > > Feature flags will be added accordingly, so it is not
> > > >> necessary
> > > >> > to
> > > >> > > > > > > implement this in all thin clients.
> > > >> > > > > > >
> > > >> > > > > > > On Mon, Feb 7, 2022 at 11:43 AM Ivan Daschinsky <
> > > >> > > ivandasch@gmail.com
> > > >> > > > >
> > > >> > > > > > > wrote:
> > > >> > > > > > >
> > > >> > > > > > > > I suppose it is great idea, but this functionality can
> > be
> > > >> hard
> > > >> > to
> > > >> > > > > > > implement
> > > >> > > > > > > > for some platforms. I.e. sync python client or php
> > (there
> > > >> is no
> > > >> > > > real
> > > >> > > > > > > > multithreading for python (GIL) and php is single
> > threaded
> > > >> by
> > > >> > > > > design).
> > > >> > > > > > > But
> > > >> > > > > > > > for async clients it is not very hard to implement.
> > > >> > Nevertheless,
> > > >> > > > > this
> > > >> > > > > > > > feature should be optional, because of possible
> > technical
> > > >> > > > > limitations.
> > > >> > > > > > > >
> > > >> > > > > > > > Pavel, is this check mostly for client side? Or
> servers
> > > can
> > > >> do
> > > >> > > some
> > > >> > > > > > > actions
> > > >> > > > > > > > if there is no activity from thin client (i.e. closing
> > > >> context
> > > >> > > and
> > > >> > > > > free
> > > >> > > > > > > > resources such as queries' handles and so on?)
> > > >> > > > > > > >
> > > >> > > > > > > > пн, 7 февр. 2022 г. в 11:09, Pavel Tupitsyn <
> > > >> > > ptupitsyn@apache.org
> > > >> > > > >:
> > > >> > > > > > > >
> > > >> > > > > > > > > Hi Maksim,
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > > > half-state is a possible situation when an Ignite
> > node
> > > >> goes
> > > >> > > > down
> > > >> > > > > or
> > > >> > > > > > > > > somehow removes connection to a thin client
> > > >> > > > > > > > >
> > > >> > > > > > > > > Half-open state is also possible when, for example,
> an
> > > >> > > > intermediate
> > > >> > > > > > > > router
> > > >> > > > > > > > > is rebooted [1].
> > > >> > > > > > > > >
> > > >> > > > > > > > > This is what we seem to have encountered with one of
> > our
> > > >> > > > customers
> > > >> > > > > -
> > > >> > > > > > > they
> > > >> > > > > > > > > have a stable cluster, and long-living (multiple
> days)
> > > >> thin
> > > >> > > > client
> > > >> > > > > > > > > connections which can be idle for some time.
> > > >> > > > > > > > > And only when we send some data on such an idle
> > > >> connection do
> > > >> > > we
> > > >> > > > > > > discover
> > > >> > > > > > > > > that it is broken.
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > > > But with enabled (true by default)
> > partitionAwareness
> > > >> > feature
> > > >> > > > > > clients
> > > >> > > > > > > > can
> > > >> > > > > > > > > be notified about topology changes
> > > >> > > > > > > > >
> > > >> > > > > > > > > Partition awareness is a "lazy" notification in a
> form
> > > of
> > > >> a
> > > >> > > > > response
> > > >> > > > > > > > > message flag [2].
> > > >> > > > > > > > > You won't get one on an idle connection.
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > > > the connections are removed on the server side by
> > > client
> > > >> > idle
> > > >> > > > > > timeout
> > > >> > > > > > > > >
> > > >> > > > > > > > > Idle timeout is disabled by default.
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > > > is it OK to keep such connections alive for a long
> > > time
> > > >> > > > > > > > >
> > > >> > > > > > > > > I think it is up to the user.
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > > > in the case of partition awareness features it can
> > > lead
> > > >> to
> > > >> > > > > wasting
> > > >> > > > > > > TCP
> > > >> > > > > > > > > sockets on Ignite nodes, can't it
> > > >> > > > > > > > >
> > > >> > > > > > > > > Can you please elaborate?
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > > [1]
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
> > > >> > > > > > > > > [2]
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
> > > >> > > > > > > > >
> > > >> > > > > > > > > On Fri, Feb 4, 2022 at 4:01 PM Maksim Timonin <
> > > >> > > > > > timoninmaxim@apache.org
> > > >> > > > > > > >
> > > >> > > > > > > > > wrote:
> > > >> > > > > > > > >
> > > >> > > > > > > > > > Hi Pavel,
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > Thanks for starting this thread! Can I ask some
> > > >> questions
> > > >> > > here
> > > >> > > > to
> > > >> > > > > > get
> > > >> > > > > > > > the
> > > >> > > > > > > > > > feature more clearly?
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > As I understand it correctly, half-state is a
> > possible
> > > >> > > > situation
> > > >> > > > > > when
> > > >> > > > > > > > an
> > > >> > > > > > > > > > Ignite node goes down or somehow removes
> connection
> > > to a
> > > >> > thin
> > > >> > > > > > client.
> > > >> > > > > > > > But
> > > >> > > > > > > > > > with enabled (true by default) partitionAwareness
> > > >> feature
> > > >> > > > clients
> > > >> > > > > > can
> > > >> > > > > > > > be
> > > >> > > > > > > > > > notified about topology changes. So, there are
> > > possible
> > > >> > > cases:
> > > >> > > > > > > > > > 1. ThinClient connects to a single node.
> > > >> > > > > > > > > > 2. Ignite node removes connection from itself.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > I like the idea for the case with a single node,
> as
> > it
> > > >> > helps
> > > >> > > > fail
> > > >> > > > > > > fast.
> > > >> > > > > > > > > > But is it OK to connect a client to a single node
> > > only?
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > For the second one: you mention that a case for
> the
> > > >> second
> > > >> > > > option
> > > >> > > > > > is
> > > >> > > > > > > > > > "Long-living and mostly idle connections are
> > > especially
> > > >> > > > > susceptible
> > > >> > > > > > > to
> > > >> > > > > > > > > this
> > > >> > > > > > > > > > behavior". If I understand correctly the
> connections
> > > are
> > > >> > > > removed
> > > >> > > > > on
> > > >> > > > > > > the
> > > >> > > > > > > > > > server side by client idle timeout. Can we just
> > > >> configure
> > > >> > the
> > > >> > > > > idle
> > > >> > > > > > > > > timeout
> > > >> > > > > > > > > > for cases where we really need keeping alive idle
> > > >> > > connections?
> > > >> > > > > Are
> > > >> > > > > > > > there
> > > >> > > > > > > > > > any other cases with unexpectedly dropped
> > connections?
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > I'm wondering is it OK to keep such connections
> > alive
> > > >> for a
> > > >> > > > long
> > > >> > > > > > > time?
> > > >> > > > > > > > > > Also in the case of partition awareness features
> it
> > > can
> > > >> > lead
> > > >> > > to
> > > >> > > > > > > wasting
> > > >> > > > > > > > > TCP
> > > >> > > > > > > > > > sockets on Ignite nodes, can't it?
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > Thanks!
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > On Thu, Feb 3, 2022 at 2:24 PM Pavel Tupitsyn <
> > > >> > > > > > ptupitsyn@apache.org>
> > > >> > > > > > > > > > wrote:
> > > >> > > > > > > > > >
> > > >> > > > > > > > > >> Igniters,
> > > >> > > > > > > > > >>
> > > >> > > > > > > > > >> Please review the proposal to add heartbeat
> > messages
> > > to
> > > >> > the
> > > >> > > > thin
> > > >> > > > > > > > client
> > > >> > > > > > > > > >> protocol (both 2.x and 3.x) and let me know your
> > > >> thoughts:
> > > >> > > > > > > > > >>
> > > >> > > > > > > > > >>
> > > >> > > > > > > > > >>
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
> > > >> > > > > > > > > >>
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > > > --
> > > >> > > > > > > > Sincerely yours, Ivan Daschinskiy
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > --
> > > >> > > > > > Sincerely yours, Ivan Daschinskiy
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >>
> > > >> --
> > > >> Sincerely yours, Ivan Daschinskiy
> > > >>
> > > >
> > >
> >
>


-- 
Sincerely yours, Ivan Daschinskiy

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Pavel Tupitsyn <pt...@apache.org>.
Igor,

> Maybe clients should pass this information on to the handshake.

Do you think we should log a mismatched timeout warning on the server, not
on the client?
Or should we do both?


I've updated the proposal with OP_GET_IDLE_TIMEOUT and some other details
discussed above.

On Mon, Feb 7, 2022 at 5:42 PM Igor Sapego <is...@apache.org> wrote:

> Feature seems useful for me as it makes connection management more robust
> and
> predictable.
>
> I agree with Pavel, that we should print warning when heartbeat period is
> larger than
> idle timeout, but I see a problem here as idle timeout is configured on
> server and is not
> known to clients, while heartbeats configured on clients and their period
> is not known
> to the server. Maybe clients should pass this information on to the
> handshake.
>
> Regarding Python and PHP clients - can not we use some kind of timers to
> implement
> this feature?
>
> Best Regards,
> Igor
>
>
> On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <pt...@apache.org>
> wrote:
>
> > Maksim, agree. Let's not be too clever and only log a warning.
> >
> > On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <pt...@apache.org>
> > wrote:
> >
> > > Ivan, idleTimeout already exists, I don't think we should change the
> way
> > > it works (or did I misunderstand you?)
> > >
> > > Of course, enabling heartbeats means that otherwise idle clients will
> no
> > > longer be disconnected by the server.
> > > I think we should cross-link those properties in the documentation and
> > > explain this behavior.
> > >
> > > On Mon, Feb 7, 2022 at 4:39 PM Ivan Daschinsky <iv...@gmail.com>
> > > wrote:
> > >
> > >> >>3. Already implemented: when
> ClientConnectorConfiguration#idleTimeout
> > is
> > >> not zero, server disconnects idle clients
> > >> >>
> > >> But I suppose it would be great to have:
> > >> 1. If client supports keep alive, use idleTimeout
> > >> 2. If not, do not use it.
> > >>
> > >> But I am not sure if it is correct or not.
> > >>
> > >> пн, 7 февр. 2022 г. в 16:01, Maksim Timonin <timoninmaxim@apache.org
> >:
> > >>
> > >> > I believe explicit is better than implicit :) Also in case of
> dynamic
> > >> > calculation of timeout, it can change dynamically, for example
> > >> restarting a
> > >> > cluster with different configuration should reconfigure clients too.
> > >> Looks
> > >> > complicated.
> > >> >
> > >> > My vote for WARN + javadocs with mention of this issue.
> > >> >
> > >> > On Mon, Feb 7, 2022 at 3:51 PM Pavel Tupitsyn <ptupitsyn@apache.org
> >
> > >> > wrote:
> > >> >
> > >> > > > WDYT, should we add a WARN message for clients that configure
> > >> > > > keepAliveTimeout greater than idleTimeout on the server side?
> > >> > >
> > >> > > I think we should either log a WARN, or retrieve idleTimeout from
> > >> server
> > >> > > and configure heartbeatTimeout accordingly (e.g. divide by 2).
> > >> > > Thoughts?
> > >> > >
> > >> > > On Mon, Feb 7, 2022 at 3:14 PM Maksim Timonin <
> > >> timoninmaxim@apache.org>
> > >> > > wrote:
> > >> > >
> > >> > > > Hi Pavel,
> > >> > > >
> > >> > > > Thanks for the links. Yes, I forgot that the flag of changed
> > >> topology
> > >> > is
> > >> > > > lazy. Also I missed that the keepAlive setting is configured on
> > the
> > >> > > client
> > >> > > > side (alternatively to idleTimeout that is on the server side).
> > >> > > >
> > >> > > > Now I understand, this feature can be helpful then. Every client
> > can
> > >> > > > configure itself in case it's possible to be idle sometimes, and
> > >> choose
> > >> > > > an appropriate timeout by itself too. And by default the feature
> > >> should
> > >> > > be
> > >> > > > disabled.
> > >> > > >
> > >> > > > WDYT, should we add a WARN message for clients that configure
> > >> > > > keepAliveTimeout greater than idleTimeout on the server side?
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > On Mon, Feb 7, 2022 at 1:05 PM Pavel Tupitsyn <
> > ptupitsyn@apache.org
> > >> >
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Ivan,
> > >> > > > >
> > >> > > > > I suggest the following:
> > >> > > > >
> > >> > > > > 1. Server sends KEEP_ALIVE feature flag, which means it
> accepts
> > >> > > > > OP_KEEP_ALIVE empty message
> > >> > > > > 2. Client sends OP_KEEP_ALIVE when the connection is idle for
> a
> > >> > > > > certain period of time
> > >> > > > > 3. Already implemented: when
> > >> ClientConnectorConfiguration#idleTimeout
> > >> > > is
> > >> > > > > not zero, server disconnects idle clients
> > >> > > > >
> > >> > > > > This way we don't need server->client keepalives, as you
> > correctly
> > >> > > noted.
> > >> > > > >
> > >> > > > > On Mon, Feb 7, 2022 at 12:43 PM Ivan Daschinsky <
> > >> ivandasch@gmail.com
> > >> > >
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Pavel, I suppose that ideally:
> > >> > > > > > 1. Client send in handshake flag, that it supports
> KEEP_ALIVE
> > >> > feature
> > >> > > > and
> > >> > > > > > server takes it into account.
> > >> > > > > > 2. Each request of client can be considered as keep-alive
> > ping.
> > >> > > > > > 3. Client send failure should be processed using retry
> policy.
> > >> > > > > > 4. Server should not send keep-alive packets, it is
> redundant,
> > >> but
> > >> > > > server
> > >> > > > > > should track requests from client and if there is no
> requests
> > >> from
> > >> > > > client
> > >> > > > > > with KEEP_ALIVE feature,
> > >> > > > > > automatically close connection and free resources.
> > >> > > > > >
> > >> > > > > > Similar approach is used in zookeeper clients.
> > >> > > > > >
> > >> > > > > > пн, 7 февр. 2022 г. в 12:24, Pavel Tupitsyn <
> > >> ptupitsyn@apache.org
> > >> > >:
> > >> > > > > >
> > >> > > > > > > Ivan,
> > >> > > > > > >
> > >> > > > > > > Ideally, the check should come from both sides.
> > >> > > > > > > - Client periodically sends keepalive to server
> > >> > > > > > > - Server periodically sends keepalive to client
> > >> > > > > > >
> > >> > > > > > > Feature flags will be added accordingly, so it is not
> > >> necessary
> > >> > to
> > >> > > > > > > implement this in all thin clients.
> > >> > > > > > >
> > >> > > > > > > On Mon, Feb 7, 2022 at 11:43 AM Ivan Daschinsky <
> > >> > > ivandasch@gmail.com
> > >> > > > >
> > >> > > > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > > I suppose it is great idea, but this functionality can
> be
> > >> hard
> > >> > to
> > >> > > > > > > implement
> > >> > > > > > > > for some platforms. I.e. sync python client or php
> (there
> > >> is no
> > >> > > > real
> > >> > > > > > > > multithreading for python (GIL) and php is single
> threaded
> > >> by
> > >> > > > > design).
> > >> > > > > > > But
> > >> > > > > > > > for async clients it is not very hard to implement.
> > >> > Nevertheless,
> > >> > > > > this
> > >> > > > > > > > feature should be optional, because of possible
> technical
> > >> > > > > limitations.
> > >> > > > > > > >
> > >> > > > > > > > Pavel, is this check mostly for client side? Or servers
> > can
> > >> do
> > >> > > some
> > >> > > > > > > actions
> > >> > > > > > > > if there is no activity from thin client (i.e. closing
> > >> context
> > >> > > and
> > >> > > > > free
> > >> > > > > > > > resources such as queries' handles and so on?)
> > >> > > > > > > >
> > >> > > > > > > > пн, 7 февр. 2022 г. в 11:09, Pavel Tupitsyn <
> > >> > > ptupitsyn@apache.org
> > >> > > > >:
> > >> > > > > > > >
> > >> > > > > > > > > Hi Maksim,
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > > half-state is a possible situation when an Ignite
> node
> > >> goes
> > >> > > > down
> > >> > > > > or
> > >> > > > > > > > > somehow removes connection to a thin client
> > >> > > > > > > > >
> > >> > > > > > > > > Half-open state is also possible when, for example, an
> > >> > > > intermediate
> > >> > > > > > > > router
> > >> > > > > > > > > is rebooted [1].
> > >> > > > > > > > >
> > >> > > > > > > > > This is what we seem to have encountered with one of
> our
> > >> > > > customers
> > >> > > > > -
> > >> > > > > > > they
> > >> > > > > > > > > have a stable cluster, and long-living (multiple days)
> > >> thin
> > >> > > > client
> > >> > > > > > > > > connections which can be idle for some time.
> > >> > > > > > > > > And only when we send some data on such an idle
> > >> connection do
> > >> > > we
> > >> > > > > > > discover
> > >> > > > > > > > > that it is broken.
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > > But with enabled (true by default)
> partitionAwareness
> > >> > feature
> > >> > > > > > clients
> > >> > > > > > > > can
> > >> > > > > > > > > be notified about topology changes
> > >> > > > > > > > >
> > >> > > > > > > > > Partition awareness is a "lazy" notification in a form
> > of
> > >> a
> > >> > > > > response
> > >> > > > > > > > > message flag [2].
> > >> > > > > > > > > You won't get one on an idle connection.
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > > the connections are removed on the server side by
> > client
> > >> > idle
> > >> > > > > > timeout
> > >> > > > > > > > >
> > >> > > > > > > > > Idle timeout is disabled by default.
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > > is it OK to keep such connections alive for a long
> > time
> > >> > > > > > > > >
> > >> > > > > > > > > I think it is up to the user.
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > > in the case of partition awareness features it can
> > lead
> > >> to
> > >> > > > > wasting
> > >> > > > > > > TCP
> > >> > > > > > > > > sockets on Ignite nodes, can't it
> > >> > > > > > > > >
> > >> > > > > > > > > Can you please elaborate?
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > [1]
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
> > >> > > > > > > > > [2]
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
> > >> > > > > > > > >
> > >> > > > > > > > > On Fri, Feb 4, 2022 at 4:01 PM Maksim Timonin <
> > >> > > > > > timoninmaxim@apache.org
> > >> > > > > > > >
> > >> > > > > > > > > wrote:
> > >> > > > > > > > >
> > >> > > > > > > > > > Hi Pavel,
> > >> > > > > > > > > >
> > >> > > > > > > > > > Thanks for starting this thread! Can I ask some
> > >> questions
> > >> > > here
> > >> > > > to
> > >> > > > > > get
> > >> > > > > > > > the
> > >> > > > > > > > > > feature more clearly?
> > >> > > > > > > > > >
> > >> > > > > > > > > > As I understand it correctly, half-state is a
> possible
> > >> > > > situation
> > >> > > > > > when
> > >> > > > > > > > an
> > >> > > > > > > > > > Ignite node goes down or somehow removes connection
> > to a
> > >> > thin
> > >> > > > > > client.
> > >> > > > > > > > But
> > >> > > > > > > > > > with enabled (true by default) partitionAwareness
> > >> feature
> > >> > > > clients
> > >> > > > > > can
> > >> > > > > > > > be
> > >> > > > > > > > > > notified about topology changes. So, there are
> > possible
> > >> > > cases:
> > >> > > > > > > > > > 1. ThinClient connects to a single node.
> > >> > > > > > > > > > 2. Ignite node removes connection from itself.
> > >> > > > > > > > > >
> > >> > > > > > > > > > I like the idea for the case with a single node, as
> it
> > >> > helps
> > >> > > > fail
> > >> > > > > > > fast.
> > >> > > > > > > > > > But is it OK to connect a client to a single node
> > only?
> > >> > > > > > > > > >
> > >> > > > > > > > > > For the second one: you mention that a case for the
> > >> second
> > >> > > > option
> > >> > > > > > is
> > >> > > > > > > > > > "Long-living and mostly idle connections are
> > especially
> > >> > > > > susceptible
> > >> > > > > > > to
> > >> > > > > > > > > this
> > >> > > > > > > > > > behavior". If I understand correctly the connections
> > are
> > >> > > > removed
> > >> > > > > on
> > >> > > > > > > the
> > >> > > > > > > > > > server side by client idle timeout. Can we just
> > >> configure
> > >> > the
> > >> > > > > idle
> > >> > > > > > > > > timeout
> > >> > > > > > > > > > for cases where we really need keeping alive idle
> > >> > > connections?
> > >> > > > > Are
> > >> > > > > > > > there
> > >> > > > > > > > > > any other cases with unexpectedly dropped
> connections?
> > >> > > > > > > > > >
> > >> > > > > > > > > > I'm wondering is it OK to keep such connections
> alive
> > >> for a
> > >> > > > long
> > >> > > > > > > time?
> > >> > > > > > > > > > Also in the case of partition awareness features it
> > can
> > >> > lead
> > >> > > to
> > >> > > > > > > wasting
> > >> > > > > > > > > TCP
> > >> > > > > > > > > > sockets on Ignite nodes, can't it?
> > >> > > > > > > > > >
> > >> > > > > > > > > > Thanks!
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > > > On Thu, Feb 3, 2022 at 2:24 PM Pavel Tupitsyn <
> > >> > > > > > ptupitsyn@apache.org>
> > >> > > > > > > > > > wrote:
> > >> > > > > > > > > >
> > >> > > > > > > > > >> Igniters,
> > >> > > > > > > > > >>
> > >> > > > > > > > > >> Please review the proposal to add heartbeat
> messages
> > to
> > >> > the
> > >> > > > thin
> > >> > > > > > > > client
> > >> > > > > > > > > >> protocol (both 2.x and 3.x) and let me know your
> > >> thoughts:
> > >> > > > > > > > > >>
> > >> > > > > > > > > >>
> > >> > > > > > > > > >>
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
> > >> > > > > > > > > >>
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > --
> > >> > > > > > > > Sincerely yours, Ivan Daschinskiy
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > --
> > >> > > > > > Sincerely yours, Ivan Daschinskiy
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> > >> --
> > >> Sincerely yours, Ivan Daschinskiy
> > >>
> > >
> >
>

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Igor Sapego <is...@apache.org>.
Feature seems useful for me as it makes connection management more robust
and
predictable.

I agree with Pavel, that we should print warning when heartbeat period is
larger than
idle timeout, but I see a problem here as idle timeout is configured on
server and is not
known to clients, while heartbeats configured on clients and their period
is not known
to the server. Maybe clients should pass this information on to the
handshake.

Regarding Python and PHP clients - can not we use some kind of timers to
implement
this feature?

Best Regards,
Igor


On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <pt...@apache.org> wrote:

> Maksim, agree. Let's not be too clever and only log a warning.
>
> On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <pt...@apache.org>
> wrote:
>
> > Ivan, idleTimeout already exists, I don't think we should change the way
> > it works (or did I misunderstand you?)
> >
> > Of course, enabling heartbeats means that otherwise idle clients will no
> > longer be disconnected by the server.
> > I think we should cross-link those properties in the documentation and
> > explain this behavior.
> >
> > On Mon, Feb 7, 2022 at 4:39 PM Ivan Daschinsky <iv...@gmail.com>
> > wrote:
> >
> >> >>3. Already implemented: when ClientConnectorConfiguration#idleTimeout
> is
> >> not zero, server disconnects idle clients
> >> >>
> >> But I suppose it would be great to have:
> >> 1. If client supports keep alive, use idleTimeout
> >> 2. If not, do not use it.
> >>
> >> But I am not sure if it is correct or not.
> >>
> >> пн, 7 февр. 2022 г. в 16:01, Maksim Timonin <ti...@apache.org>:
> >>
> >> > I believe explicit is better than implicit :) Also in case of dynamic
> >> > calculation of timeout, it can change dynamically, for example
> >> restarting a
> >> > cluster with different configuration should reconfigure clients too.
> >> Looks
> >> > complicated.
> >> >
> >> > My vote for WARN + javadocs with mention of this issue.
> >> >
> >> > On Mon, Feb 7, 2022 at 3:51 PM Pavel Tupitsyn <pt...@apache.org>
> >> > wrote:
> >> >
> >> > > > WDYT, should we add a WARN message for clients that configure
> >> > > > keepAliveTimeout greater than idleTimeout on the server side?
> >> > >
> >> > > I think we should either log a WARN, or retrieve idleTimeout from
> >> server
> >> > > and configure heartbeatTimeout accordingly (e.g. divide by 2).
> >> > > Thoughts?
> >> > >
> >> > > On Mon, Feb 7, 2022 at 3:14 PM Maksim Timonin <
> >> timoninmaxim@apache.org>
> >> > > wrote:
> >> > >
> >> > > > Hi Pavel,
> >> > > >
> >> > > > Thanks for the links. Yes, I forgot that the flag of changed
> >> topology
> >> > is
> >> > > > lazy. Also I missed that the keepAlive setting is configured on
> the
> >> > > client
> >> > > > side (alternatively to idleTimeout that is on the server side).
> >> > > >
> >> > > > Now I understand, this feature can be helpful then. Every client
> can
> >> > > > configure itself in case it's possible to be idle sometimes, and
> >> choose
> >> > > > an appropriate timeout by itself too. And by default the feature
> >> should
> >> > > be
> >> > > > disabled.
> >> > > >
> >> > > > WDYT, should we add a WARN message for clients that configure
> >> > > > keepAliveTimeout greater than idleTimeout on the server side?
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Mon, Feb 7, 2022 at 1:05 PM Pavel Tupitsyn <
> ptupitsyn@apache.org
> >> >
> >> > > > wrote:
> >> > > >
> >> > > > > Ivan,
> >> > > > >
> >> > > > > I suggest the following:
> >> > > > >
> >> > > > > 1. Server sends KEEP_ALIVE feature flag, which means it accepts
> >> > > > > OP_KEEP_ALIVE empty message
> >> > > > > 2. Client sends OP_KEEP_ALIVE when the connection is idle for a
> >> > > > > certain period of time
> >> > > > > 3. Already implemented: when
> >> ClientConnectorConfiguration#idleTimeout
> >> > > is
> >> > > > > not zero, server disconnects idle clients
> >> > > > >
> >> > > > > This way we don't need server->client keepalives, as you
> correctly
> >> > > noted.
> >> > > > >
> >> > > > > On Mon, Feb 7, 2022 at 12:43 PM Ivan Daschinsky <
> >> ivandasch@gmail.com
> >> > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Pavel, I suppose that ideally:
> >> > > > > > 1. Client send in handshake flag, that it supports KEEP_ALIVE
> >> > feature
> >> > > > and
> >> > > > > > server takes it into account.
> >> > > > > > 2. Each request of client can be considered as keep-alive
> ping.
> >> > > > > > 3. Client send failure should be processed using retry policy.
> >> > > > > > 4. Server should not send keep-alive packets, it is redundant,
> >> but
> >> > > > server
> >> > > > > > should track requests from client and if there is no requests
> >> from
> >> > > > client
> >> > > > > > with KEEP_ALIVE feature,
> >> > > > > > automatically close connection and free resources.
> >> > > > > >
> >> > > > > > Similar approach is used in zookeeper clients.
> >> > > > > >
> >> > > > > > пн, 7 февр. 2022 г. в 12:24, Pavel Tupitsyn <
> >> ptupitsyn@apache.org
> >> > >:
> >> > > > > >
> >> > > > > > > Ivan,
> >> > > > > > >
> >> > > > > > > Ideally, the check should come from both sides.
> >> > > > > > > - Client periodically sends keepalive to server
> >> > > > > > > - Server periodically sends keepalive to client
> >> > > > > > >
> >> > > > > > > Feature flags will be added accordingly, so it is not
> >> necessary
> >> > to
> >> > > > > > > implement this in all thin clients.
> >> > > > > > >
> >> > > > > > > On Mon, Feb 7, 2022 at 11:43 AM Ivan Daschinsky <
> >> > > ivandasch@gmail.com
> >> > > > >
> >> > > > > > > wrote:
> >> > > > > > >
> >> > > > > > > > I suppose it is great idea, but this functionality can be
> >> hard
> >> > to
> >> > > > > > > implement
> >> > > > > > > > for some platforms. I.e. sync python client or php (there
> >> is no
> >> > > > real
> >> > > > > > > > multithreading for python (GIL) and php is single threaded
> >> by
> >> > > > > design).
> >> > > > > > > But
> >> > > > > > > > for async clients it is not very hard to implement.
> >> > Nevertheless,
> >> > > > > this
> >> > > > > > > > feature should be optional, because of possible technical
> >> > > > > limitations.
> >> > > > > > > >
> >> > > > > > > > Pavel, is this check mostly for client side? Or servers
> can
> >> do
> >> > > some
> >> > > > > > > actions
> >> > > > > > > > if there is no activity from thin client (i.e. closing
> >> context
> >> > > and
> >> > > > > free
> >> > > > > > > > resources such as queries' handles and so on?)
> >> > > > > > > >
> >> > > > > > > > пн, 7 февр. 2022 г. в 11:09, Pavel Tupitsyn <
> >> > > ptupitsyn@apache.org
> >> > > > >:
> >> > > > > > > >
> >> > > > > > > > > Hi Maksim,
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > > half-state is a possible situation when an Ignite node
> >> goes
> >> > > > down
> >> > > > > or
> >> > > > > > > > > somehow removes connection to a thin client
> >> > > > > > > > >
> >> > > > > > > > > Half-open state is also possible when, for example, an
> >> > > > intermediate
> >> > > > > > > > router
> >> > > > > > > > > is rebooted [1].
> >> > > > > > > > >
> >> > > > > > > > > This is what we seem to have encountered with one of our
> >> > > > customers
> >> > > > > -
> >> > > > > > > they
> >> > > > > > > > > have a stable cluster, and long-living (multiple days)
> >> thin
> >> > > > client
> >> > > > > > > > > connections which can be idle for some time.
> >> > > > > > > > > And only when we send some data on such an idle
> >> connection do
> >> > > we
> >> > > > > > > discover
> >> > > > > > > > > that it is broken.
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > > But with enabled (true by default) partitionAwareness
> >> > feature
> >> > > > > > clients
> >> > > > > > > > can
> >> > > > > > > > > be notified about topology changes
> >> > > > > > > > >
> >> > > > > > > > > Partition awareness is a "lazy" notification in a form
> of
> >> a
> >> > > > > response
> >> > > > > > > > > message flag [2].
> >> > > > > > > > > You won't get one on an idle connection.
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > > the connections are removed on the server side by
> client
> >> > idle
> >> > > > > > timeout
> >> > > > > > > > >
> >> > > > > > > > > Idle timeout is disabled by default.
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > > is it OK to keep such connections alive for a long
> time
> >> > > > > > > > >
> >> > > > > > > > > I think it is up to the user.
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > > in the case of partition awareness features it can
> lead
> >> to
> >> > > > > wasting
> >> > > > > > > TCP
> >> > > > > > > > > sockets on Ignite nodes, can't it
> >> > > > > > > > >
> >> > > > > > > > > Can you please elaborate?
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > [1]
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
> >> > > > > > > > > [2]
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
> >> > > > > > > > >
> >> > > > > > > > > On Fri, Feb 4, 2022 at 4:01 PM Maksim Timonin <
> >> > > > > > timoninmaxim@apache.org
> >> > > > > > > >
> >> > > > > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > > Hi Pavel,
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks for starting this thread! Can I ask some
> >> questions
> >> > > here
> >> > > > to
> >> > > > > > get
> >> > > > > > > > the
> >> > > > > > > > > > feature more clearly?
> >> > > > > > > > > >
> >> > > > > > > > > > As I understand it correctly, half-state is a possible
> >> > > > situation
> >> > > > > > when
> >> > > > > > > > an
> >> > > > > > > > > > Ignite node goes down or somehow removes connection
> to a
> >> > thin
> >> > > > > > client.
> >> > > > > > > > But
> >> > > > > > > > > > with enabled (true by default) partitionAwareness
> >> feature
> >> > > > clients
> >> > > > > > can
> >> > > > > > > > be
> >> > > > > > > > > > notified about topology changes. So, there are
> possible
> >> > > cases:
> >> > > > > > > > > > 1. ThinClient connects to a single node.
> >> > > > > > > > > > 2. Ignite node removes connection from itself.
> >> > > > > > > > > >
> >> > > > > > > > > > I like the idea for the case with a single node, as it
> >> > helps
> >> > > > fail
> >> > > > > > > fast.
> >> > > > > > > > > > But is it OK to connect a client to a single node
> only?
> >> > > > > > > > > >
> >> > > > > > > > > > For the second one: you mention that a case for the
> >> second
> >> > > > option
> >> > > > > > is
> >> > > > > > > > > > "Long-living and mostly idle connections are
> especially
> >> > > > > susceptible
> >> > > > > > > to
> >> > > > > > > > > this
> >> > > > > > > > > > behavior". If I understand correctly the connections
> are
> >> > > > removed
> >> > > > > on
> >> > > > > > > the
> >> > > > > > > > > > server side by client idle timeout. Can we just
> >> configure
> >> > the
> >> > > > > idle
> >> > > > > > > > > timeout
> >> > > > > > > > > > for cases where we really need keeping alive idle
> >> > > connections?
> >> > > > > Are
> >> > > > > > > > there
> >> > > > > > > > > > any other cases with unexpectedly dropped connections?
> >> > > > > > > > > >
> >> > > > > > > > > > I'm wondering is it OK to keep such connections alive
> >> for a
> >> > > > long
> >> > > > > > > time?
> >> > > > > > > > > > Also in the case of partition awareness features it
> can
> >> > lead
> >> > > to
> >> > > > > > > wasting
> >> > > > > > > > > TCP
> >> > > > > > > > > > sockets on Ignite nodes, can't it?
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks!
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > On Thu, Feb 3, 2022 at 2:24 PM Pavel Tupitsyn <
> >> > > > > > ptupitsyn@apache.org>
> >> > > > > > > > > > wrote:
> >> > > > > > > > > >
> >> > > > > > > > > >> Igniters,
> >> > > > > > > > > >>
> >> > > > > > > > > >> Please review the proposal to add heartbeat messages
> to
> >> > the
> >> > > > thin
> >> > > > > > > > client
> >> > > > > > > > > >> protocol (both 2.x and 3.x) and let me know your
> >> thoughts:
> >> > > > > > > > > >>
> >> > > > > > > > > >>
> >> > > > > > > > > >>
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
> >> > > > > > > > > >>
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > --
> >> > > > > > > > Sincerely yours, Ivan Daschinskiy
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > --
> >> > > > > > Sincerely yours, Ivan Daschinskiy
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >>
> >> --
> >> Sincerely yours, Ivan Daschinskiy
> >>
> >
>

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Pavel Tupitsyn <pt...@apache.org>.
Maksim, agree. Let's not be too clever and only log a warning.

On Mon, Feb 7, 2022 at 5:23 PM Pavel Tupitsyn <pt...@apache.org> wrote:

> Ivan, idleTimeout already exists, I don't think we should change the way
> it works (or did I misunderstand you?)
>
> Of course, enabling heartbeats means that otherwise idle clients will no
> longer be disconnected by the server.
> I think we should cross-link those properties in the documentation and
> explain this behavior.
>
> On Mon, Feb 7, 2022 at 4:39 PM Ivan Daschinsky <iv...@gmail.com>
> wrote:
>
>> >>3. Already implemented: when ClientConnectorConfiguration#idleTimeout is
>> not zero, server disconnects idle clients
>> >>
>> But I suppose it would be great to have:
>> 1. If client supports keep alive, use idleTimeout
>> 2. If not, do not use it.
>>
>> But I am not sure if it is correct or not.
>>
>> пн, 7 февр. 2022 г. в 16:01, Maksim Timonin <ti...@apache.org>:
>>
>> > I believe explicit is better than implicit :) Also in case of dynamic
>> > calculation of timeout, it can change dynamically, for example
>> restarting a
>> > cluster with different configuration should reconfigure clients too.
>> Looks
>> > complicated.
>> >
>> > My vote for WARN + javadocs with mention of this issue.
>> >
>> > On Mon, Feb 7, 2022 at 3:51 PM Pavel Tupitsyn <pt...@apache.org>
>> > wrote:
>> >
>> > > > WDYT, should we add a WARN message for clients that configure
>> > > > keepAliveTimeout greater than idleTimeout on the server side?
>> > >
>> > > I think we should either log a WARN, or retrieve idleTimeout from
>> server
>> > > and configure heartbeatTimeout accordingly (e.g. divide by 2).
>> > > Thoughts?
>> > >
>> > > On Mon, Feb 7, 2022 at 3:14 PM Maksim Timonin <
>> timoninmaxim@apache.org>
>> > > wrote:
>> > >
>> > > > Hi Pavel,
>> > > >
>> > > > Thanks for the links. Yes, I forgot that the flag of changed
>> topology
>> > is
>> > > > lazy. Also I missed that the keepAlive setting is configured on the
>> > > client
>> > > > side (alternatively to idleTimeout that is on the server side).
>> > > >
>> > > > Now I understand, this feature can be helpful then. Every client can
>> > > > configure itself in case it's possible to be idle sometimes, and
>> choose
>> > > > an appropriate timeout by itself too. And by default the feature
>> should
>> > > be
>> > > > disabled.
>> > > >
>> > > > WDYT, should we add a WARN message for clients that configure
>> > > > keepAliveTimeout greater than idleTimeout on the server side?
>> > > >
>> > > >
>> > > >
>> > > > On Mon, Feb 7, 2022 at 1:05 PM Pavel Tupitsyn <ptupitsyn@apache.org
>> >
>> > > > wrote:
>> > > >
>> > > > > Ivan,
>> > > > >
>> > > > > I suggest the following:
>> > > > >
>> > > > > 1. Server sends KEEP_ALIVE feature flag, which means it accepts
>> > > > > OP_KEEP_ALIVE empty message
>> > > > > 2. Client sends OP_KEEP_ALIVE when the connection is idle for a
>> > > > > certain period of time
>> > > > > 3. Already implemented: when
>> ClientConnectorConfiguration#idleTimeout
>> > > is
>> > > > > not zero, server disconnects idle clients
>> > > > >
>> > > > > This way we don't need server->client keepalives, as you correctly
>> > > noted.
>> > > > >
>> > > > > On Mon, Feb 7, 2022 at 12:43 PM Ivan Daschinsky <
>> ivandasch@gmail.com
>> > >
>> > > > > wrote:
>> > > > >
>> > > > > > Pavel, I suppose that ideally:
>> > > > > > 1. Client send in handshake flag, that it supports KEEP_ALIVE
>> > feature
>> > > > and
>> > > > > > server takes it into account.
>> > > > > > 2. Each request of client can be considered as keep-alive ping.
>> > > > > > 3. Client send failure should be processed using retry policy.
>> > > > > > 4. Server should not send keep-alive packets, it is redundant,
>> but
>> > > > server
>> > > > > > should track requests from client and if there is no requests
>> from
>> > > > client
>> > > > > > with KEEP_ALIVE feature,
>> > > > > > automatically close connection and free resources.
>> > > > > >
>> > > > > > Similar approach is used in zookeeper clients.
>> > > > > >
>> > > > > > пн, 7 февр. 2022 г. в 12:24, Pavel Tupitsyn <
>> ptupitsyn@apache.org
>> > >:
>> > > > > >
>> > > > > > > Ivan,
>> > > > > > >
>> > > > > > > Ideally, the check should come from both sides.
>> > > > > > > - Client periodically sends keepalive to server
>> > > > > > > - Server periodically sends keepalive to client
>> > > > > > >
>> > > > > > > Feature flags will be added accordingly, so it is not
>> necessary
>> > to
>> > > > > > > implement this in all thin clients.
>> > > > > > >
>> > > > > > > On Mon, Feb 7, 2022 at 11:43 AM Ivan Daschinsky <
>> > > ivandasch@gmail.com
>> > > > >
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > I suppose it is great idea, but this functionality can be
>> hard
>> > to
>> > > > > > > implement
>> > > > > > > > for some platforms. I.e. sync python client or php (there
>> is no
>> > > > real
>> > > > > > > > multithreading for python (GIL) and php is single threaded
>> by
>> > > > > design).
>> > > > > > > But
>> > > > > > > > for async clients it is not very hard to implement.
>> > Nevertheless,
>> > > > > this
>> > > > > > > > feature should be optional, because of possible technical
>> > > > > limitations.
>> > > > > > > >
>> > > > > > > > Pavel, is this check mostly for client side? Or servers can
>> do
>> > > some
>> > > > > > > actions
>> > > > > > > > if there is no activity from thin client (i.e. closing
>> context
>> > > and
>> > > > > free
>> > > > > > > > resources such as queries' handles and so on?)
>> > > > > > > >
>> > > > > > > > пн, 7 февр. 2022 г. в 11:09, Pavel Tupitsyn <
>> > > ptupitsyn@apache.org
>> > > > >:
>> > > > > > > >
>> > > > > > > > > Hi Maksim,
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > > half-state is a possible situation when an Ignite node
>> goes
>> > > > down
>> > > > > or
>> > > > > > > > > somehow removes connection to a thin client
>> > > > > > > > >
>> > > > > > > > > Half-open state is also possible when, for example, an
>> > > > intermediate
>> > > > > > > > router
>> > > > > > > > > is rebooted [1].
>> > > > > > > > >
>> > > > > > > > > This is what we seem to have encountered with one of our
>> > > > customers
>> > > > > -
>> > > > > > > they
>> > > > > > > > > have a stable cluster, and long-living (multiple days)
>> thin
>> > > > client
>> > > > > > > > > connections which can be idle for some time.
>> > > > > > > > > And only when we send some data on such an idle
>> connection do
>> > > we
>> > > > > > > discover
>> > > > > > > > > that it is broken.
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > > But with enabled (true by default) partitionAwareness
>> > feature
>> > > > > > clients
>> > > > > > > > can
>> > > > > > > > > be notified about topology changes
>> > > > > > > > >
>> > > > > > > > > Partition awareness is a "lazy" notification in a form of
>> a
>> > > > > response
>> > > > > > > > > message flag [2].
>> > > > > > > > > You won't get one on an idle connection.
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > > the connections are removed on the server side by client
>> > idle
>> > > > > > timeout
>> > > > > > > > >
>> > > > > > > > > Idle timeout is disabled by default.
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > > is it OK to keep such connections alive for a long time
>> > > > > > > > >
>> > > > > > > > > I think it is up to the user.
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > > in the case of partition awareness features it can lead
>> to
>> > > > > wasting
>> > > > > > > TCP
>> > > > > > > > > sockets on Ignite nodes, can't it
>> > > > > > > > >
>> > > > > > > > > Can you please elaborate?
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > [1]
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
>> > > > > > > > > [2]
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
>> > > > > > > > >
>> > > > > > > > > On Fri, Feb 4, 2022 at 4:01 PM Maksim Timonin <
>> > > > > > timoninmaxim@apache.org
>> > > > > > > >
>> > > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Hi Pavel,
>> > > > > > > > > >
>> > > > > > > > > > Thanks for starting this thread! Can I ask some
>> questions
>> > > here
>> > > > to
>> > > > > > get
>> > > > > > > > the
>> > > > > > > > > > feature more clearly?
>> > > > > > > > > >
>> > > > > > > > > > As I understand it correctly, half-state is a possible
>> > > > situation
>> > > > > > when
>> > > > > > > > an
>> > > > > > > > > > Ignite node goes down or somehow removes connection to a
>> > thin
>> > > > > > client.
>> > > > > > > > But
>> > > > > > > > > > with enabled (true by default) partitionAwareness
>> feature
>> > > > clients
>> > > > > > can
>> > > > > > > > be
>> > > > > > > > > > notified about topology changes. So, there are possible
>> > > cases:
>> > > > > > > > > > 1. ThinClient connects to a single node.
>> > > > > > > > > > 2. Ignite node removes connection from itself.
>> > > > > > > > > >
>> > > > > > > > > > I like the idea for the case with a single node, as it
>> > helps
>> > > > fail
>> > > > > > > fast.
>> > > > > > > > > > But is it OK to connect a client to a single node only?
>> > > > > > > > > >
>> > > > > > > > > > For the second one: you mention that a case for the
>> second
>> > > > option
>> > > > > > is
>> > > > > > > > > > "Long-living and mostly idle connections are especially
>> > > > > susceptible
>> > > > > > > to
>> > > > > > > > > this
>> > > > > > > > > > behavior". If I understand correctly the connections are
>> > > > removed
>> > > > > on
>> > > > > > > the
>> > > > > > > > > > server side by client idle timeout. Can we just
>> configure
>> > the
>> > > > > idle
>> > > > > > > > > timeout
>> > > > > > > > > > for cases where we really need keeping alive idle
>> > > connections?
>> > > > > Are
>> > > > > > > > there
>> > > > > > > > > > any other cases with unexpectedly dropped connections?
>> > > > > > > > > >
>> > > > > > > > > > I'm wondering is it OK to keep such connections alive
>> for a
>> > > > long
>> > > > > > > time?
>> > > > > > > > > > Also in the case of partition awareness features it can
>> > lead
>> > > to
>> > > > > > > wasting
>> > > > > > > > > TCP
>> > > > > > > > > > sockets on Ignite nodes, can't it?
>> > > > > > > > > >
>> > > > > > > > > > Thanks!
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > On Thu, Feb 3, 2022 at 2:24 PM Pavel Tupitsyn <
>> > > > > > ptupitsyn@apache.org>
>> > > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > >> Igniters,
>> > > > > > > > > >>
>> > > > > > > > > >> Please review the proposal to add heartbeat messages to
>> > the
>> > > > thin
>> > > > > > > > client
>> > > > > > > > > >> protocol (both 2.x and 3.x) and let me know your
>> thoughts:
>> > > > > > > > > >>
>> > > > > > > > > >>
>> > > > > > > > > >>
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
>> > > > > > > > > >>
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > --
>> > > > > > > > Sincerely yours, Ivan Daschinskiy
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > Sincerely yours, Ivan Daschinskiy
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>>
>> --
>> Sincerely yours, Ivan Daschinskiy
>>
>

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Pavel Tupitsyn <pt...@apache.org>.
Ivan, idleTimeout already exists, I don't think we should change the way it
works (or did I misunderstand you?)

Of course, enabling heartbeats means that otherwise idle clients will no
longer be disconnected by the server.
I think we should cross-link those properties in the documentation and
explain this behavior.

On Mon, Feb 7, 2022 at 4:39 PM Ivan Daschinsky <iv...@gmail.com> wrote:

> >>3. Already implemented: when ClientConnectorConfiguration#idleTimeout is
> not zero, server disconnects idle clients
> >>
> But I suppose it would be great to have:
> 1. If client supports keep alive, use idleTimeout
> 2. If not, do not use it.
>
> But I am not sure if it is correct or not.
>
> пн, 7 февр. 2022 г. в 16:01, Maksim Timonin <ti...@apache.org>:
>
> > I believe explicit is better than implicit :) Also in case of dynamic
> > calculation of timeout, it can change dynamically, for example
> restarting a
> > cluster with different configuration should reconfigure clients too.
> Looks
> > complicated.
> >
> > My vote for WARN + javadocs with mention of this issue.
> >
> > On Mon, Feb 7, 2022 at 3:51 PM Pavel Tupitsyn <pt...@apache.org>
> > wrote:
> >
> > > > WDYT, should we add a WARN message for clients that configure
> > > > keepAliveTimeout greater than idleTimeout on the server side?
> > >
> > > I think we should either log a WARN, or retrieve idleTimeout from
> server
> > > and configure heartbeatTimeout accordingly (e.g. divide by 2).
> > > Thoughts?
> > >
> > > On Mon, Feb 7, 2022 at 3:14 PM Maksim Timonin <timoninmaxim@apache.org
> >
> > > wrote:
> > >
> > > > Hi Pavel,
> > > >
> > > > Thanks for the links. Yes, I forgot that the flag of changed topology
> > is
> > > > lazy. Also I missed that the keepAlive setting is configured on the
> > > client
> > > > side (alternatively to idleTimeout that is on the server side).
> > > >
> > > > Now I understand, this feature can be helpful then. Every client can
> > > > configure itself in case it's possible to be idle sometimes, and
> choose
> > > > an appropriate timeout by itself too. And by default the feature
> should
> > > be
> > > > disabled.
> > > >
> > > > WDYT, should we add a WARN message for clients that configure
> > > > keepAliveTimeout greater than idleTimeout on the server side?
> > > >
> > > >
> > > >
> > > > On Mon, Feb 7, 2022 at 1:05 PM Pavel Tupitsyn <pt...@apache.org>
> > > > wrote:
> > > >
> > > > > Ivan,
> > > > >
> > > > > I suggest the following:
> > > > >
> > > > > 1. Server sends KEEP_ALIVE feature flag, which means it accepts
> > > > > OP_KEEP_ALIVE empty message
> > > > > 2. Client sends OP_KEEP_ALIVE when the connection is idle for a
> > > > > certain period of time
> > > > > 3. Already implemented: when
> ClientConnectorConfiguration#idleTimeout
> > > is
> > > > > not zero, server disconnects idle clients
> > > > >
> > > > > This way we don't need server->client keepalives, as you correctly
> > > noted.
> > > > >
> > > > > On Mon, Feb 7, 2022 at 12:43 PM Ivan Daschinsky <
> ivandasch@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Pavel, I suppose that ideally:
> > > > > > 1. Client send in handshake flag, that it supports KEEP_ALIVE
> > feature
> > > > and
> > > > > > server takes it into account.
> > > > > > 2. Each request of client can be considered as keep-alive ping.
> > > > > > 3. Client send failure should be processed using retry policy.
> > > > > > 4. Server should not send keep-alive packets, it is redundant,
> but
> > > > server
> > > > > > should track requests from client and if there is no requests
> from
> > > > client
> > > > > > with KEEP_ALIVE feature,
> > > > > > automatically close connection and free resources.
> > > > > >
> > > > > > Similar approach is used in zookeeper clients.
> > > > > >
> > > > > > пн, 7 февр. 2022 г. в 12:24, Pavel Tupitsyn <
> ptupitsyn@apache.org
> > >:
> > > > > >
> > > > > > > Ivan,
> > > > > > >
> > > > > > > Ideally, the check should come from both sides.
> > > > > > > - Client periodically sends keepalive to server
> > > > > > > - Server periodically sends keepalive to client
> > > > > > >
> > > > > > > Feature flags will be added accordingly, so it is not necessary
> > to
> > > > > > > implement this in all thin clients.
> > > > > > >
> > > > > > > On Mon, Feb 7, 2022 at 11:43 AM Ivan Daschinsky <
> > > ivandasch@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I suppose it is great idea, but this functionality can be
> hard
> > to
> > > > > > > implement
> > > > > > > > for some platforms. I.e. sync python client or php (there is
> no
> > > > real
> > > > > > > > multithreading for python (GIL) and php is single threaded by
> > > > > design).
> > > > > > > But
> > > > > > > > for async clients it is not very hard to implement.
> > Nevertheless,
> > > > > this
> > > > > > > > feature should be optional, because of possible technical
> > > > > limitations.
> > > > > > > >
> > > > > > > > Pavel, is this check mostly for client side? Or servers can
> do
> > > some
> > > > > > > actions
> > > > > > > > if there is no activity from thin client (i.e. closing
> context
> > > and
> > > > > free
> > > > > > > > resources such as queries' handles and so on?)
> > > > > > > >
> > > > > > > > пн, 7 февр. 2022 г. в 11:09, Pavel Tupitsyn <
> > > ptupitsyn@apache.org
> > > > >:
> > > > > > > >
> > > > > > > > > Hi Maksim,
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > half-state is a possible situation when an Ignite node
> goes
> > > > down
> > > > > or
> > > > > > > > > somehow removes connection to a thin client
> > > > > > > > >
> > > > > > > > > Half-open state is also possible when, for example, an
> > > > intermediate
> > > > > > > > router
> > > > > > > > > is rebooted [1].
> > > > > > > > >
> > > > > > > > > This is what we seem to have encountered with one of our
> > > > customers
> > > > > -
> > > > > > > they
> > > > > > > > > have a stable cluster, and long-living (multiple days) thin
> > > > client
> > > > > > > > > connections which can be idle for some time.
> > > > > > > > > And only when we send some data on such an idle connection
> do
> > > we
> > > > > > > discover
> > > > > > > > > that it is broken.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > But with enabled (true by default) partitionAwareness
> > feature
> > > > > > clients
> > > > > > > > can
> > > > > > > > > be notified about topology changes
> > > > > > > > >
> > > > > > > > > Partition awareness is a "lazy" notification in a form of a
> > > > > response
> > > > > > > > > message flag [2].
> > > > > > > > > You won't get one on an idle connection.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > the connections are removed on the server side by client
> > idle
> > > > > > timeout
> > > > > > > > >
> > > > > > > > > Idle timeout is disabled by default.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > is it OK to keep such connections alive for a long time
> > > > > > > > >
> > > > > > > > > I think it is up to the user.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > in the case of partition awareness features it can lead
> to
> > > > > wasting
> > > > > > > TCP
> > > > > > > > > sockets on Ignite nodes, can't it
> > > > > > > > >
> > > > > > > > > Can you please elaborate?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > [1]
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
> > > > > > > > > [2]
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
> > > > > > > > >
> > > > > > > > > On Fri, Feb 4, 2022 at 4:01 PM Maksim Timonin <
> > > > > > timoninmaxim@apache.org
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Pavel,
> > > > > > > > > >
> > > > > > > > > > Thanks for starting this thread! Can I ask some questions
> > > here
> > > > to
> > > > > > get
> > > > > > > > the
> > > > > > > > > > feature more clearly?
> > > > > > > > > >
> > > > > > > > > > As I understand it correctly, half-state is a possible
> > > > situation
> > > > > > when
> > > > > > > > an
> > > > > > > > > > Ignite node goes down or somehow removes connection to a
> > thin
> > > > > > client.
> > > > > > > > But
> > > > > > > > > > with enabled (true by default) partitionAwareness feature
> > > > clients
> > > > > > can
> > > > > > > > be
> > > > > > > > > > notified about topology changes. So, there are possible
> > > cases:
> > > > > > > > > > 1. ThinClient connects to a single node.
> > > > > > > > > > 2. Ignite node removes connection from itself.
> > > > > > > > > >
> > > > > > > > > > I like the idea for the case with a single node, as it
> > helps
> > > > fail
> > > > > > > fast.
> > > > > > > > > > But is it OK to connect a client to a single node only?
> > > > > > > > > >
> > > > > > > > > > For the second one: you mention that a case for the
> second
> > > > option
> > > > > > is
> > > > > > > > > > "Long-living and mostly idle connections are especially
> > > > > susceptible
> > > > > > > to
> > > > > > > > > this
> > > > > > > > > > behavior". If I understand correctly the connections are
> > > > removed
> > > > > on
> > > > > > > the
> > > > > > > > > > server side by client idle timeout. Can we just configure
> > the
> > > > > idle
> > > > > > > > > timeout
> > > > > > > > > > for cases where we really need keeping alive idle
> > > connections?
> > > > > Are
> > > > > > > > there
> > > > > > > > > > any other cases with unexpectedly dropped connections?
> > > > > > > > > >
> > > > > > > > > > I'm wondering is it OK to keep such connections alive
> for a
> > > > long
> > > > > > > time?
> > > > > > > > > > Also in the case of partition awareness features it can
> > lead
> > > to
> > > > > > > wasting
> > > > > > > > > TCP
> > > > > > > > > > sockets on Ignite nodes, can't it?
> > > > > > > > > >
> > > > > > > > > > Thanks!
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Thu, Feb 3, 2022 at 2:24 PM Pavel Tupitsyn <
> > > > > > ptupitsyn@apache.org>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> Igniters,
> > > > > > > > > >>
> > > > > > > > > >> Please review the proposal to add heartbeat messages to
> > the
> > > > thin
> > > > > > > > client
> > > > > > > > > >> protocol (both 2.x and 3.x) and let me know your
> thoughts:
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Sincerely yours, Ivan Daschinskiy
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Sincerely yours, Ivan Daschinskiy
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> Sincerely yours, Ivan Daschinskiy
>

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Ivan Daschinsky <iv...@gmail.com>.
>>3. Already implemented: when ClientConnectorConfiguration#idleTimeout is
not zero, server disconnects idle clients
>>
But I suppose it would be great to have:
1. If client supports keep alive, use idleTimeout
2. If not, do not use it.

But I am not sure if it is correct or not.

пн, 7 февр. 2022 г. в 16:01, Maksim Timonin <ti...@apache.org>:

> I believe explicit is better than implicit :) Also in case of dynamic
> calculation of timeout, it can change dynamically, for example restarting a
> cluster with different configuration should reconfigure clients too. Looks
> complicated.
>
> My vote for WARN + javadocs with mention of this issue.
>
> On Mon, Feb 7, 2022 at 3:51 PM Pavel Tupitsyn <pt...@apache.org>
> wrote:
>
> > > WDYT, should we add a WARN message for clients that configure
> > > keepAliveTimeout greater than idleTimeout on the server side?
> >
> > I think we should either log a WARN, or retrieve idleTimeout from server
> > and configure heartbeatTimeout accordingly (e.g. divide by 2).
> > Thoughts?
> >
> > On Mon, Feb 7, 2022 at 3:14 PM Maksim Timonin <ti...@apache.org>
> > wrote:
> >
> > > Hi Pavel,
> > >
> > > Thanks for the links. Yes, I forgot that the flag of changed topology
> is
> > > lazy. Also I missed that the keepAlive setting is configured on the
> > client
> > > side (alternatively to idleTimeout that is on the server side).
> > >
> > > Now I understand, this feature can be helpful then. Every client can
> > > configure itself in case it's possible to be idle sometimes, and choose
> > > an appropriate timeout by itself too. And by default the feature should
> > be
> > > disabled.
> > >
> > > WDYT, should we add a WARN message for clients that configure
> > > keepAliveTimeout greater than idleTimeout on the server side?
> > >
> > >
> > >
> > > On Mon, Feb 7, 2022 at 1:05 PM Pavel Tupitsyn <pt...@apache.org>
> > > wrote:
> > >
> > > > Ivan,
> > > >
> > > > I suggest the following:
> > > >
> > > > 1. Server sends KEEP_ALIVE feature flag, which means it accepts
> > > > OP_KEEP_ALIVE empty message
> > > > 2. Client sends OP_KEEP_ALIVE when the connection is idle for a
> > > > certain period of time
> > > > 3. Already implemented: when ClientConnectorConfiguration#idleTimeout
> > is
> > > > not zero, server disconnects idle clients
> > > >
> > > > This way we don't need server->client keepalives, as you correctly
> > noted.
> > > >
> > > > On Mon, Feb 7, 2022 at 12:43 PM Ivan Daschinsky <ivandasch@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Pavel, I suppose that ideally:
> > > > > 1. Client send in handshake flag, that it supports KEEP_ALIVE
> feature
> > > and
> > > > > server takes it into account.
> > > > > 2. Each request of client can be considered as keep-alive ping.
> > > > > 3. Client send failure should be processed using retry policy.
> > > > > 4. Server should not send keep-alive packets, it is redundant, but
> > > server
> > > > > should track requests from client and if there is no requests from
> > > client
> > > > > with KEEP_ALIVE feature,
> > > > > automatically close connection and free resources.
> > > > >
> > > > > Similar approach is used in zookeeper clients.
> > > > >
> > > > > пн, 7 февр. 2022 г. в 12:24, Pavel Tupitsyn <ptupitsyn@apache.org
> >:
> > > > >
> > > > > > Ivan,
> > > > > >
> > > > > > Ideally, the check should come from both sides.
> > > > > > - Client periodically sends keepalive to server
> > > > > > - Server periodically sends keepalive to client
> > > > > >
> > > > > > Feature flags will be added accordingly, so it is not necessary
> to
> > > > > > implement this in all thin clients.
> > > > > >
> > > > > > On Mon, Feb 7, 2022 at 11:43 AM Ivan Daschinsky <
> > ivandasch@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I suppose it is great idea, but this functionality can be hard
> to
> > > > > > implement
> > > > > > > for some platforms. I.e. sync python client or php (there is no
> > > real
> > > > > > > multithreading for python (GIL) and php is single threaded by
> > > > design).
> > > > > > But
> > > > > > > for async clients it is not very hard to implement.
> Nevertheless,
> > > > this
> > > > > > > feature should be optional, because of possible technical
> > > > limitations.
> > > > > > >
> > > > > > > Pavel, is this check mostly for client side? Or servers can do
> > some
> > > > > > actions
> > > > > > > if there is no activity from thin client (i.e. closing context
> > and
> > > > free
> > > > > > > resources such as queries' handles and so on?)
> > > > > > >
> > > > > > > пн, 7 февр. 2022 г. в 11:09, Pavel Tupitsyn <
> > ptupitsyn@apache.org
> > > >:
> > > > > > >
> > > > > > > > Hi Maksim,
> > > > > > > >
> > > > > > > >
> > > > > > > > > half-state is a possible situation when an Ignite node goes
> > > down
> > > > or
> > > > > > > > somehow removes connection to a thin client
> > > > > > > >
> > > > > > > > Half-open state is also possible when, for example, an
> > > intermediate
> > > > > > > router
> > > > > > > > is rebooted [1].
> > > > > > > >
> > > > > > > > This is what we seem to have encountered with one of our
> > > customers
> > > > -
> > > > > > they
> > > > > > > > have a stable cluster, and long-living (multiple days) thin
> > > client
> > > > > > > > connections which can be idle for some time.
> > > > > > > > And only when we send some data on such an idle connection do
> > we
> > > > > > discover
> > > > > > > > that it is broken.
> > > > > > > >
> > > > > > > >
> > > > > > > > > But with enabled (true by default) partitionAwareness
> feature
> > > > > clients
> > > > > > > can
> > > > > > > > be notified about topology changes
> > > > > > > >
> > > > > > > > Partition awareness is a "lazy" notification in a form of a
> > > > response
> > > > > > > > message flag [2].
> > > > > > > > You won't get one on an idle connection.
> > > > > > > >
> > > > > > > >
> > > > > > > > > the connections are removed on the server side by client
> idle
> > > > > timeout
> > > > > > > >
> > > > > > > > Idle timeout is disabled by default.
> > > > > > > >
> > > > > > > >
> > > > > > > > > is it OK to keep such connections alive for a long time
> > > > > > > >
> > > > > > > > I think it is up to the user.
> > > > > > > >
> > > > > > > >
> > > > > > > > > in the case of partition awareness features it can lead to
> > > > wasting
> > > > > > TCP
> > > > > > > > sockets on Ignite nodes, can't it
> > > > > > > >
> > > > > > > > Can you please elaborate?
> > > > > > > >
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
> > > > > > > > [2]
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
> > > > > > > >
> > > > > > > > On Fri, Feb 4, 2022 at 4:01 PM Maksim Timonin <
> > > > > timoninmaxim@apache.org
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Pavel,
> > > > > > > > >
> > > > > > > > > Thanks for starting this thread! Can I ask some questions
> > here
> > > to
> > > > > get
> > > > > > > the
> > > > > > > > > feature more clearly?
> > > > > > > > >
> > > > > > > > > As I understand it correctly, half-state is a possible
> > > situation
> > > > > when
> > > > > > > an
> > > > > > > > > Ignite node goes down or somehow removes connection to a
> thin
> > > > > client.
> > > > > > > But
> > > > > > > > > with enabled (true by default) partitionAwareness feature
> > > clients
> > > > > can
> > > > > > > be
> > > > > > > > > notified about topology changes. So, there are possible
> > cases:
> > > > > > > > > 1. ThinClient connects to a single node.
> > > > > > > > > 2. Ignite node removes connection from itself.
> > > > > > > > >
> > > > > > > > > I like the idea for the case with a single node, as it
> helps
> > > fail
> > > > > > fast.
> > > > > > > > > But is it OK to connect a client to a single node only?
> > > > > > > > >
> > > > > > > > > For the second one: you mention that a case for the second
> > > option
> > > > > is
> > > > > > > > > "Long-living and mostly idle connections are especially
> > > > susceptible
> > > > > > to
> > > > > > > > this
> > > > > > > > > behavior". If I understand correctly the connections are
> > > removed
> > > > on
> > > > > > the
> > > > > > > > > server side by client idle timeout. Can we just configure
> the
> > > > idle
> > > > > > > > timeout
> > > > > > > > > for cases where we really need keeping alive idle
> > connections?
> > > > Are
> > > > > > > there
> > > > > > > > > any other cases with unexpectedly dropped connections?
> > > > > > > > >
> > > > > > > > > I'm wondering is it OK to keep such connections alive for a
> > > long
> > > > > > time?
> > > > > > > > > Also in the case of partition awareness features it can
> lead
> > to
> > > > > > wasting
> > > > > > > > TCP
> > > > > > > > > sockets on Ignite nodes, can't it?
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, Feb 3, 2022 at 2:24 PM Pavel Tupitsyn <
> > > > > ptupitsyn@apache.org>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> Igniters,
> > > > > > > > >>
> > > > > > > > >> Please review the proposal to add heartbeat messages to
> the
> > > thin
> > > > > > > client
> > > > > > > > >> protocol (both 2.x and 3.x) and let me know your thoughts:
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
> > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Sincerely yours, Ivan Daschinskiy
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sincerely yours, Ivan Daschinskiy
> > > > >
> > > >
> > >
> >
>


-- 
Sincerely yours, Ivan Daschinskiy

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Maksim Timonin <ti...@apache.org>.
I believe explicit is better than implicit :) Also in case of dynamic
calculation of timeout, it can change dynamically, for example restarting a
cluster with different configuration should reconfigure clients too. Looks
complicated.

My vote for WARN + javadocs with mention of this issue.

On Mon, Feb 7, 2022 at 3:51 PM Pavel Tupitsyn <pt...@apache.org> wrote:

> > WDYT, should we add a WARN message for clients that configure
> > keepAliveTimeout greater than idleTimeout on the server side?
>
> I think we should either log a WARN, or retrieve idleTimeout from server
> and configure heartbeatTimeout accordingly (e.g. divide by 2).
> Thoughts?
>
> On Mon, Feb 7, 2022 at 3:14 PM Maksim Timonin <ti...@apache.org>
> wrote:
>
> > Hi Pavel,
> >
> > Thanks for the links. Yes, I forgot that the flag of changed topology is
> > lazy. Also I missed that the keepAlive setting is configured on the
> client
> > side (alternatively to idleTimeout that is on the server side).
> >
> > Now I understand, this feature can be helpful then. Every client can
> > configure itself in case it's possible to be idle sometimes, and choose
> > an appropriate timeout by itself too. And by default the feature should
> be
> > disabled.
> >
> > WDYT, should we add a WARN message for clients that configure
> > keepAliveTimeout greater than idleTimeout on the server side?
> >
> >
> >
> > On Mon, Feb 7, 2022 at 1:05 PM Pavel Tupitsyn <pt...@apache.org>
> > wrote:
> >
> > > Ivan,
> > >
> > > I suggest the following:
> > >
> > > 1. Server sends KEEP_ALIVE feature flag, which means it accepts
> > > OP_KEEP_ALIVE empty message
> > > 2. Client sends OP_KEEP_ALIVE when the connection is idle for a
> > > certain period of time
> > > 3. Already implemented: when ClientConnectorConfiguration#idleTimeout
> is
> > > not zero, server disconnects idle clients
> > >
> > > This way we don't need server->client keepalives, as you correctly
> noted.
> > >
> > > On Mon, Feb 7, 2022 at 12:43 PM Ivan Daschinsky <iv...@gmail.com>
> > > wrote:
> > >
> > > > Pavel, I suppose that ideally:
> > > > 1. Client send in handshake flag, that it supports KEEP_ALIVE feature
> > and
> > > > server takes it into account.
> > > > 2. Each request of client can be considered as keep-alive ping.
> > > > 3. Client send failure should be processed using retry policy.
> > > > 4. Server should not send keep-alive packets, it is redundant, but
> > server
> > > > should track requests from client and if there is no requests from
> > client
> > > > with KEEP_ALIVE feature,
> > > > automatically close connection and free resources.
> > > >
> > > > Similar approach is used in zookeeper clients.
> > > >
> > > > пн, 7 февр. 2022 г. в 12:24, Pavel Tupitsyn <pt...@apache.org>:
> > > >
> > > > > Ivan,
> > > > >
> > > > > Ideally, the check should come from both sides.
> > > > > - Client periodically sends keepalive to server
> > > > > - Server periodically sends keepalive to client
> > > > >
> > > > > Feature flags will be added accordingly, so it is not necessary to
> > > > > implement this in all thin clients.
> > > > >
> > > > > On Mon, Feb 7, 2022 at 11:43 AM Ivan Daschinsky <
> ivandasch@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > I suppose it is great idea, but this functionality can be hard to
> > > > > implement
> > > > > > for some platforms. I.e. sync python client or php (there is no
> > real
> > > > > > multithreading for python (GIL) and php is single threaded by
> > > design).
> > > > > But
> > > > > > for async clients it is not very hard to implement. Nevertheless,
> > > this
> > > > > > feature should be optional, because of possible technical
> > > limitations.
> > > > > >
> > > > > > Pavel, is this check mostly for client side? Or servers can do
> some
> > > > > actions
> > > > > > if there is no activity from thin client (i.e. closing context
> and
> > > free
> > > > > > resources such as queries' handles and so on?)
> > > > > >
> > > > > > пн, 7 февр. 2022 г. в 11:09, Pavel Tupitsyn <
> ptupitsyn@apache.org
> > >:
> > > > > >
> > > > > > > Hi Maksim,
> > > > > > >
> > > > > > >
> > > > > > > > half-state is a possible situation when an Ignite node goes
> > down
> > > or
> > > > > > > somehow removes connection to a thin client
> > > > > > >
> > > > > > > Half-open state is also possible when, for example, an
> > intermediate
> > > > > > router
> > > > > > > is rebooted [1].
> > > > > > >
> > > > > > > This is what we seem to have encountered with one of our
> > customers
> > > -
> > > > > they
> > > > > > > have a stable cluster, and long-living (multiple days) thin
> > client
> > > > > > > connections which can be idle for some time.
> > > > > > > And only when we send some data on such an idle connection do
> we
> > > > > discover
> > > > > > > that it is broken.
> > > > > > >
> > > > > > >
> > > > > > > > But with enabled (true by default) partitionAwareness feature
> > > > clients
> > > > > > can
> > > > > > > be notified about topology changes
> > > > > > >
> > > > > > > Partition awareness is a "lazy" notification in a form of a
> > > response
> > > > > > > message flag [2].
> > > > > > > You won't get one on an idle connection.
> > > > > > >
> > > > > > >
> > > > > > > > the connections are removed on the server side by client idle
> > > > timeout
> > > > > > >
> > > > > > > Idle timeout is disabled by default.
> > > > > > >
> > > > > > >
> > > > > > > > is it OK to keep such connections alive for a long time
> > > > > > >
> > > > > > > I think it is up to the user.
> > > > > > >
> > > > > > >
> > > > > > > > in the case of partition awareness features it can lead to
> > > wasting
> > > > > TCP
> > > > > > > sockets on Ignite nodes, can't it
> > > > > > >
> > > > > > > Can you please elaborate?
> > > > > > >
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
> > > > > > > [2]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
> > > > > > >
> > > > > > > On Fri, Feb 4, 2022 at 4:01 PM Maksim Timonin <
> > > > timoninmaxim@apache.org
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Pavel,
> > > > > > > >
> > > > > > > > Thanks for starting this thread! Can I ask some questions
> here
> > to
> > > > get
> > > > > > the
> > > > > > > > feature more clearly?
> > > > > > > >
> > > > > > > > As I understand it correctly, half-state is a possible
> > situation
> > > > when
> > > > > > an
> > > > > > > > Ignite node goes down or somehow removes connection to a thin
> > > > client.
> > > > > > But
> > > > > > > > with enabled (true by default) partitionAwareness feature
> > clients
> > > > can
> > > > > > be
> > > > > > > > notified about topology changes. So, there are possible
> cases:
> > > > > > > > 1. ThinClient connects to a single node.
> > > > > > > > 2. Ignite node removes connection from itself.
> > > > > > > >
> > > > > > > > I like the idea for the case with a single node, as it helps
> > fail
> > > > > fast.
> > > > > > > > But is it OK to connect a client to a single node only?
> > > > > > > >
> > > > > > > > For the second one: you mention that a case for the second
> > option
> > > > is
> > > > > > > > "Long-living and mostly idle connections are especially
> > > susceptible
> > > > > to
> > > > > > > this
> > > > > > > > behavior". If I understand correctly the connections are
> > removed
> > > on
> > > > > the
> > > > > > > > server side by client idle timeout. Can we just configure the
> > > idle
> > > > > > > timeout
> > > > > > > > for cases where we really need keeping alive idle
> connections?
> > > Are
> > > > > > there
> > > > > > > > any other cases with unexpectedly dropped connections?
> > > > > > > >
> > > > > > > > I'm wondering is it OK to keep such connections alive for a
> > long
> > > > > time?
> > > > > > > > Also in the case of partition awareness features it can lead
> to
> > > > > wasting
> > > > > > > TCP
> > > > > > > > sockets on Ignite nodes, can't it?
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Feb 3, 2022 at 2:24 PM Pavel Tupitsyn <
> > > > ptupitsyn@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Igniters,
> > > > > > > >>
> > > > > > > >> Please review the proposal to add heartbeat messages to the
> > thin
> > > > > > client
> > > > > > > >> protocol (both 2.x and 3.x) and let me know your thoughts:
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Sincerely yours, Ivan Daschinskiy
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Sincerely yours, Ivan Daschinskiy
> > > >
> > >
> >
>

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Pavel Tupitsyn <pt...@apache.org>.
> WDYT, should we add a WARN message for clients that configure
> keepAliveTimeout greater than idleTimeout on the server side?

I think we should either log a WARN, or retrieve idleTimeout from server
and configure heartbeatTimeout accordingly (e.g. divide by 2).
Thoughts?

On Mon, Feb 7, 2022 at 3:14 PM Maksim Timonin <ti...@apache.org>
wrote:

> Hi Pavel,
>
> Thanks for the links. Yes, I forgot that the flag of changed topology is
> lazy. Also I missed that the keepAlive setting is configured on the client
> side (alternatively to idleTimeout that is on the server side).
>
> Now I understand, this feature can be helpful then. Every client can
> configure itself in case it's possible to be idle sometimes, and choose
> an appropriate timeout by itself too. And by default the feature should be
> disabled.
>
> WDYT, should we add a WARN message for clients that configure
> keepAliveTimeout greater than idleTimeout on the server side?
>
>
>
> On Mon, Feb 7, 2022 at 1:05 PM Pavel Tupitsyn <pt...@apache.org>
> wrote:
>
> > Ivan,
> >
> > I suggest the following:
> >
> > 1. Server sends KEEP_ALIVE feature flag, which means it accepts
> > OP_KEEP_ALIVE empty message
> > 2. Client sends OP_KEEP_ALIVE when the connection is idle for a
> > certain period of time
> > 3. Already implemented: when ClientConnectorConfiguration#idleTimeout is
> > not zero, server disconnects idle clients
> >
> > This way we don't need server->client keepalives, as you correctly noted.
> >
> > On Mon, Feb 7, 2022 at 12:43 PM Ivan Daschinsky <iv...@gmail.com>
> > wrote:
> >
> > > Pavel, I suppose that ideally:
> > > 1. Client send in handshake flag, that it supports KEEP_ALIVE feature
> and
> > > server takes it into account.
> > > 2. Each request of client can be considered as keep-alive ping.
> > > 3. Client send failure should be processed using retry policy.
> > > 4. Server should not send keep-alive packets, it is redundant, but
> server
> > > should track requests from client and if there is no requests from
> client
> > > with KEEP_ALIVE feature,
> > > automatically close connection and free resources.
> > >
> > > Similar approach is used in zookeeper clients.
> > >
> > > пн, 7 февр. 2022 г. в 12:24, Pavel Tupitsyn <pt...@apache.org>:
> > >
> > > > Ivan,
> > > >
> > > > Ideally, the check should come from both sides.
> > > > - Client periodically sends keepalive to server
> > > > - Server periodically sends keepalive to client
> > > >
> > > > Feature flags will be added accordingly, so it is not necessary to
> > > > implement this in all thin clients.
> > > >
> > > > On Mon, Feb 7, 2022 at 11:43 AM Ivan Daschinsky <ivandasch@gmail.com
> >
> > > > wrote:
> > > >
> > > > > I suppose it is great idea, but this functionality can be hard to
> > > > implement
> > > > > for some platforms. I.e. sync python client or php (there is no
> real
> > > > > multithreading for python (GIL) and php is single threaded by
> > design).
> > > > But
> > > > > for async clients it is not very hard to implement. Nevertheless,
> > this
> > > > > feature should be optional, because of possible technical
> > limitations.
> > > > >
> > > > > Pavel, is this check mostly for client side? Or servers can do some
> > > > actions
> > > > > if there is no activity from thin client (i.e. closing context and
> > free
> > > > > resources such as queries' handles and so on?)
> > > > >
> > > > > пн, 7 февр. 2022 г. в 11:09, Pavel Tupitsyn <ptupitsyn@apache.org
> >:
> > > > >
> > > > > > Hi Maksim,
> > > > > >
> > > > > >
> > > > > > > half-state is a possible situation when an Ignite node goes
> down
> > or
> > > > > > somehow removes connection to a thin client
> > > > > >
> > > > > > Half-open state is also possible when, for example, an
> intermediate
> > > > > router
> > > > > > is rebooted [1].
> > > > > >
> > > > > > This is what we seem to have encountered with one of our
> customers
> > -
> > > > they
> > > > > > have a stable cluster, and long-living (multiple days) thin
> client
> > > > > > connections which can be idle for some time.
> > > > > > And only when we send some data on such an idle connection do we
> > > > discover
> > > > > > that it is broken.
> > > > > >
> > > > > >
> > > > > > > But with enabled (true by default) partitionAwareness feature
> > > clients
> > > > > can
> > > > > > be notified about topology changes
> > > > > >
> > > > > > Partition awareness is a "lazy" notification in a form of a
> > response
> > > > > > message flag [2].
> > > > > > You won't get one on an idle connection.
> > > > > >
> > > > > >
> > > > > > > the connections are removed on the server side by client idle
> > > timeout
> > > > > >
> > > > > > Idle timeout is disabled by default.
> > > > > >
> > > > > >
> > > > > > > is it OK to keep such connections alive for a long time
> > > > > >
> > > > > > I think it is up to the user.
> > > > > >
> > > > > >
> > > > > > > in the case of partition awareness features it can lead to
> > wasting
> > > > TCP
> > > > > > sockets on Ignite nodes, can't it
> > > > > >
> > > > > > Can you please elaborate?
> > > > > >
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > > >
> > >
> >
> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
> > > > > > [2]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
> > > > > >
> > > > > > On Fri, Feb 4, 2022 at 4:01 PM Maksim Timonin <
> > > timoninmaxim@apache.org
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Pavel,
> > > > > > >
> > > > > > > Thanks for starting this thread! Can I ask some questions here
> to
> > > get
> > > > > the
> > > > > > > feature more clearly?
> > > > > > >
> > > > > > > As I understand it correctly, half-state is a possible
> situation
> > > when
> > > > > an
> > > > > > > Ignite node goes down or somehow removes connection to a thin
> > > client.
> > > > > But
> > > > > > > with enabled (true by default) partitionAwareness feature
> clients
> > > can
> > > > > be
> > > > > > > notified about topology changes. So, there are possible cases:
> > > > > > > 1. ThinClient connects to a single node.
> > > > > > > 2. Ignite node removes connection from itself.
> > > > > > >
> > > > > > > I like the idea for the case with a single node, as it helps
> fail
> > > > fast.
> > > > > > > But is it OK to connect a client to a single node only?
> > > > > > >
> > > > > > > For the second one: you mention that a case for the second
> option
> > > is
> > > > > > > "Long-living and mostly idle connections are especially
> > susceptible
> > > > to
> > > > > > this
> > > > > > > behavior". If I understand correctly the connections are
> removed
> > on
> > > > the
> > > > > > > server side by client idle timeout. Can we just configure the
> > idle
> > > > > > timeout
> > > > > > > for cases where we really need keeping alive idle connections?
> > Are
> > > > > there
> > > > > > > any other cases with unexpectedly dropped connections?
> > > > > > >
> > > > > > > I'm wondering is it OK to keep such connections alive for a
> long
> > > > time?
> > > > > > > Also in the case of partition awareness features it can lead to
> > > > wasting
> > > > > > TCP
> > > > > > > sockets on Ignite nodes, can't it?
> > > > > > >
> > > > > > > Thanks!
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Feb 3, 2022 at 2:24 PM Pavel Tupitsyn <
> > > ptupitsyn@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Igniters,
> > > > > > >>
> > > > > > >> Please review the proposal to add heartbeat messages to the
> thin
> > > > > client
> > > > > > >> protocol (both 2.x and 3.x) and let me know your thoughts:
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sincerely yours, Ivan Daschinskiy
> > > > >
> > > >
> > >
> > >
> > > --
> > > Sincerely yours, Ivan Daschinskiy
> > >
> >
>

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Maksim Timonin <ti...@apache.org>.
Hi Pavel,

Thanks for the links. Yes, I forgot that the flag of changed topology is
lazy. Also I missed that the keepAlive setting is configured on the client
side (alternatively to idleTimeout that is on the server side).

Now I understand, this feature can be helpful then. Every client can
configure itself in case it's possible to be idle sometimes, and choose
an appropriate timeout by itself too. And by default the feature should be
disabled.

WDYT, should we add a WARN message for clients that configure
keepAliveTimeout greater than idleTimeout on the server side?



On Mon, Feb 7, 2022 at 1:05 PM Pavel Tupitsyn <pt...@apache.org> wrote:

> Ivan,
>
> I suggest the following:
>
> 1. Server sends KEEP_ALIVE feature flag, which means it accepts
> OP_KEEP_ALIVE empty message
> 2. Client sends OP_KEEP_ALIVE when the connection is idle for a
> certain period of time
> 3. Already implemented: when ClientConnectorConfiguration#idleTimeout is
> not zero, server disconnects idle clients
>
> This way we don't need server->client keepalives, as you correctly noted.
>
> On Mon, Feb 7, 2022 at 12:43 PM Ivan Daschinsky <iv...@gmail.com>
> wrote:
>
> > Pavel, I suppose that ideally:
> > 1. Client send in handshake flag, that it supports KEEP_ALIVE feature and
> > server takes it into account.
> > 2. Each request of client can be considered as keep-alive ping.
> > 3. Client send failure should be processed using retry policy.
> > 4. Server should not send keep-alive packets, it is redundant, but server
> > should track requests from client and if there is no requests from client
> > with KEEP_ALIVE feature,
> > automatically close connection and free resources.
> >
> > Similar approach is used in zookeeper clients.
> >
> > пн, 7 февр. 2022 г. в 12:24, Pavel Tupitsyn <pt...@apache.org>:
> >
> > > Ivan,
> > >
> > > Ideally, the check should come from both sides.
> > > - Client periodically sends keepalive to server
> > > - Server periodically sends keepalive to client
> > >
> > > Feature flags will be added accordingly, so it is not necessary to
> > > implement this in all thin clients.
> > >
> > > On Mon, Feb 7, 2022 at 11:43 AM Ivan Daschinsky <iv...@gmail.com>
> > > wrote:
> > >
> > > > I suppose it is great idea, but this functionality can be hard to
> > > implement
> > > > for some platforms. I.e. sync python client or php (there is no real
> > > > multithreading for python (GIL) and php is single threaded by
> design).
> > > But
> > > > for async clients it is not very hard to implement. Nevertheless,
> this
> > > > feature should be optional, because of possible technical
> limitations.
> > > >
> > > > Pavel, is this check mostly for client side? Or servers can do some
> > > actions
> > > > if there is no activity from thin client (i.e. closing context and
> free
> > > > resources such as queries' handles and so on?)
> > > >
> > > > пн, 7 февр. 2022 г. в 11:09, Pavel Tupitsyn <pt...@apache.org>:
> > > >
> > > > > Hi Maksim,
> > > > >
> > > > >
> > > > > > half-state is a possible situation when an Ignite node goes down
> or
> > > > > somehow removes connection to a thin client
> > > > >
> > > > > Half-open state is also possible when, for example, an intermediate
> > > > router
> > > > > is rebooted [1].
> > > > >
> > > > > This is what we seem to have encountered with one of our customers
> -
> > > they
> > > > > have a stable cluster, and long-living (multiple days) thin client
> > > > > connections which can be idle for some time.
> > > > > And only when we send some data on such an idle connection do we
> > > discover
> > > > > that it is broken.
> > > > >
> > > > >
> > > > > > But with enabled (true by default) partitionAwareness feature
> > clients
> > > > can
> > > > > be notified about topology changes
> > > > >
> > > > > Partition awareness is a "lazy" notification in a form of a
> response
> > > > > message flag [2].
> > > > > You won't get one on an idle connection.
> > > > >
> > > > >
> > > > > > the connections are removed on the server side by client idle
> > timeout
> > > > >
> > > > > Idle timeout is disabled by default.
> > > > >
> > > > >
> > > > > > is it OK to keep such connections alive for a long time
> > > > >
> > > > > I think it is up to the user.
> > > > >
> > > > >
> > > > > > in the case of partition awareness features it can lead to
> wasting
> > > TCP
> > > > > sockets on Ignite nodes, can't it
> > > > >
> > > > > Can you please elaborate?
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
> > > > >
> > > > > On Fri, Feb 4, 2022 at 4:01 PM Maksim Timonin <
> > timoninmaxim@apache.org
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Pavel,
> > > > > >
> > > > > > Thanks for starting this thread! Can I ask some questions here to
> > get
> > > > the
> > > > > > feature more clearly?
> > > > > >
> > > > > > As I understand it correctly, half-state is a possible situation
> > when
> > > > an
> > > > > > Ignite node goes down or somehow removes connection to a thin
> > client.
> > > > But
> > > > > > with enabled (true by default) partitionAwareness feature clients
> > can
> > > > be
> > > > > > notified about topology changes. So, there are possible cases:
> > > > > > 1. ThinClient connects to a single node.
> > > > > > 2. Ignite node removes connection from itself.
> > > > > >
> > > > > > I like the idea for the case with a single node, as it helps fail
> > > fast.
> > > > > > But is it OK to connect a client to a single node only?
> > > > > >
> > > > > > For the second one: you mention that a case for the second option
> > is
> > > > > > "Long-living and mostly idle connections are especially
> susceptible
> > > to
> > > > > this
> > > > > > behavior". If I understand correctly the connections are removed
> on
> > > the
> > > > > > server side by client idle timeout. Can we just configure the
> idle
> > > > > timeout
> > > > > > for cases where we really need keeping alive idle connections?
> Are
> > > > there
> > > > > > any other cases with unexpectedly dropped connections?
> > > > > >
> > > > > > I'm wondering is it OK to keep such connections alive for a long
> > > time?
> > > > > > Also in the case of partition awareness features it can lead to
> > > wasting
> > > > > TCP
> > > > > > sockets on Ignite nodes, can't it?
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Feb 3, 2022 at 2:24 PM Pavel Tupitsyn <
> > ptupitsyn@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > >> Igniters,
> > > > > >>
> > > > > >> Please review the proposal to add heartbeat messages to the thin
> > > > client
> > > > > >> protocol (both 2.x and 3.x) and let me know your thoughts:
> > > > > >>
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
> > > > > >>
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Sincerely yours, Ivan Daschinskiy
> > > >
> > >
> >
> >
> > --
> > Sincerely yours, Ivan Daschinskiy
> >
>

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Pavel Tupitsyn <pt...@apache.org>.
Ivan,

I suggest the following:

1. Server sends KEEP_ALIVE feature flag, which means it accepts
OP_KEEP_ALIVE empty message
2. Client sends OP_KEEP_ALIVE when the connection is idle for a
certain period of time
3. Already implemented: when ClientConnectorConfiguration#idleTimeout is
not zero, server disconnects idle clients

This way we don't need server->client keepalives, as you correctly noted.

On Mon, Feb 7, 2022 at 12:43 PM Ivan Daschinsky <iv...@gmail.com> wrote:

> Pavel, I suppose that ideally:
> 1. Client send in handshake flag, that it supports KEEP_ALIVE feature and
> server takes it into account.
> 2. Each request of client can be considered as keep-alive ping.
> 3. Client send failure should be processed using retry policy.
> 4. Server should not send keep-alive packets, it is redundant, but server
> should track requests from client and if there is no requests from client
> with KEEP_ALIVE feature,
> automatically close connection and free resources.
>
> Similar approach is used in zookeeper clients.
>
> пн, 7 февр. 2022 г. в 12:24, Pavel Tupitsyn <pt...@apache.org>:
>
> > Ivan,
> >
> > Ideally, the check should come from both sides.
> > - Client periodically sends keepalive to server
> > - Server periodically sends keepalive to client
> >
> > Feature flags will be added accordingly, so it is not necessary to
> > implement this in all thin clients.
> >
> > On Mon, Feb 7, 2022 at 11:43 AM Ivan Daschinsky <iv...@gmail.com>
> > wrote:
> >
> > > I suppose it is great idea, but this functionality can be hard to
> > implement
> > > for some platforms. I.e. sync python client or php (there is no real
> > > multithreading for python (GIL) and php is single threaded by design).
> > But
> > > for async clients it is not very hard to implement. Nevertheless, this
> > > feature should be optional, because of possible technical limitations.
> > >
> > > Pavel, is this check mostly for client side? Or servers can do some
> > actions
> > > if there is no activity from thin client (i.e. closing context and free
> > > resources such as queries' handles and so on?)
> > >
> > > пн, 7 февр. 2022 г. в 11:09, Pavel Tupitsyn <pt...@apache.org>:
> > >
> > > > Hi Maksim,
> > > >
> > > >
> > > > > half-state is a possible situation when an Ignite node goes down or
> > > > somehow removes connection to a thin client
> > > >
> > > > Half-open state is also possible when, for example, an intermediate
> > > router
> > > > is rebooted [1].
> > > >
> > > > This is what we seem to have encountered with one of our customers -
> > they
> > > > have a stable cluster, and long-living (multiple days) thin client
> > > > connections which can be idle for some time.
> > > > And only when we send some data on such an idle connection do we
> > discover
> > > > that it is broken.
> > > >
> > > >
> > > > > But with enabled (true by default) partitionAwareness feature
> clients
> > > can
> > > > be notified about topology changes
> > > >
> > > > Partition awareness is a "lazy" notification in a form of a response
> > > > message flag [2].
> > > > You won't get one on an idle connection.
> > > >
> > > >
> > > > > the connections are removed on the server side by client idle
> timeout
> > > >
> > > > Idle timeout is disabled by default.
> > > >
> > > >
> > > > > is it OK to keep such connections alive for a long time
> > > >
> > > > I think it is up to the user.
> > > >
> > > >
> > > > > in the case of partition awareness features it can lead to wasting
> > TCP
> > > > sockets on Ignite nodes, can't it
> > > >
> > > > Can you please elaborate?
> > > >
> > > >
> > > > [1]
> > > >
> > >
> >
> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
> > > > [2]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
> > > >
> > > > On Fri, Feb 4, 2022 at 4:01 PM Maksim Timonin <
> timoninmaxim@apache.org
> > >
> > > > wrote:
> > > >
> > > > > Hi Pavel,
> > > > >
> > > > > Thanks for starting this thread! Can I ask some questions here to
> get
> > > the
> > > > > feature more clearly?
> > > > >
> > > > > As I understand it correctly, half-state is a possible situation
> when
> > > an
> > > > > Ignite node goes down or somehow removes connection to a thin
> client.
> > > But
> > > > > with enabled (true by default) partitionAwareness feature clients
> can
> > > be
> > > > > notified about topology changes. So, there are possible cases:
> > > > > 1. ThinClient connects to a single node.
> > > > > 2. Ignite node removes connection from itself.
> > > > >
> > > > > I like the idea for the case with a single node, as it helps fail
> > fast.
> > > > > But is it OK to connect a client to a single node only?
> > > > >
> > > > > For the second one: you mention that a case for the second option
> is
> > > > > "Long-living and mostly idle connections are especially susceptible
> > to
> > > > this
> > > > > behavior". If I understand correctly the connections are removed on
> > the
> > > > > server side by client idle timeout. Can we just configure the idle
> > > > timeout
> > > > > for cases where we really need keeping alive idle connections? Are
> > > there
> > > > > any other cases with unexpectedly dropped connections?
> > > > >
> > > > > I'm wondering is it OK to keep such connections alive for a long
> > time?
> > > > > Also in the case of partition awareness features it can lead to
> > wasting
> > > > TCP
> > > > > sockets on Ignite nodes, can't it?
> > > > >
> > > > > Thanks!
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Feb 3, 2022 at 2:24 PM Pavel Tupitsyn <
> ptupitsyn@apache.org>
> > > > > wrote:
> > > > >
> > > > >> Igniters,
> > > > >>
> > > > >> Please review the proposal to add heartbeat messages to the thin
> > > client
> > > > >> protocol (both 2.x and 3.x) and let me know your thoughts:
> > > > >>
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
> > > > >>
> > > > >
> > > >
> > >
> > >
> > > --
> > > Sincerely yours, Ivan Daschinskiy
> > >
> >
>
>
> --
> Sincerely yours, Ivan Daschinskiy
>

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Ivan Daschinsky <iv...@gmail.com>.
Pavel, I suppose that ideally:
1. Client send in handshake flag, that it supports KEEP_ALIVE feature and
server takes it into account.
2. Each request of client can be considered as keep-alive ping.
3. Client send failure should be processed using retry policy.
4. Server should not send keep-alive packets, it is redundant, but server
should track requests from client and if there is no requests from client
with KEEP_ALIVE feature,
automatically close connection and free resources.

Similar approach is used in zookeeper clients.

пн, 7 февр. 2022 г. в 12:24, Pavel Tupitsyn <pt...@apache.org>:

> Ivan,
>
> Ideally, the check should come from both sides.
> - Client periodically sends keepalive to server
> - Server periodically sends keepalive to client
>
> Feature flags will be added accordingly, so it is not necessary to
> implement this in all thin clients.
>
> On Mon, Feb 7, 2022 at 11:43 AM Ivan Daschinsky <iv...@gmail.com>
> wrote:
>
> > I suppose it is great idea, but this functionality can be hard to
> implement
> > for some platforms. I.e. sync python client or php (there is no real
> > multithreading for python (GIL) and php is single threaded by design).
> But
> > for async clients it is not very hard to implement. Nevertheless, this
> > feature should be optional, because of possible technical limitations.
> >
> > Pavel, is this check mostly for client side? Or servers can do some
> actions
> > if there is no activity from thin client (i.e. closing context and free
> > resources such as queries' handles and so on?)
> >
> > пн, 7 февр. 2022 г. в 11:09, Pavel Tupitsyn <pt...@apache.org>:
> >
> > > Hi Maksim,
> > >
> > >
> > > > half-state is a possible situation when an Ignite node goes down or
> > > somehow removes connection to a thin client
> > >
> > > Half-open state is also possible when, for example, an intermediate
> > router
> > > is rebooted [1].
> > >
> > > This is what we seem to have encountered with one of our customers -
> they
> > > have a stable cluster, and long-living (multiple days) thin client
> > > connections which can be idle for some time.
> > > And only when we send some data on such an idle connection do we
> discover
> > > that it is broken.
> > >
> > >
> > > > But with enabled (true by default) partitionAwareness feature clients
> > can
> > > be notified about topology changes
> > >
> > > Partition awareness is a "lazy" notification in a form of a response
> > > message flag [2].
> > > You won't get one on an idle connection.
> > >
> > >
> > > > the connections are removed on the server side by client idle timeout
> > >
> > > Idle timeout is disabled by default.
> > >
> > >
> > > > is it OK to keep such connections alive for a long time
> > >
> > > I think it is up to the user.
> > >
> > >
> > > > in the case of partition awareness features it can lead to wasting
> TCP
> > > sockets on Ignite nodes, can't it
> > >
> > > Can you please elaborate?
> > >
> > >
> > > [1]
> > >
> >
> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
> > > [2]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
> > >
> > > On Fri, Feb 4, 2022 at 4:01 PM Maksim Timonin <timoninmaxim@apache.org
> >
> > > wrote:
> > >
> > > > Hi Pavel,
> > > >
> > > > Thanks for starting this thread! Can I ask some questions here to get
> > the
> > > > feature more clearly?
> > > >
> > > > As I understand it correctly, half-state is a possible situation when
> > an
> > > > Ignite node goes down or somehow removes connection to a thin client.
> > But
> > > > with enabled (true by default) partitionAwareness feature clients can
> > be
> > > > notified about topology changes. So, there are possible cases:
> > > > 1. ThinClient connects to a single node.
> > > > 2. Ignite node removes connection from itself.
> > > >
> > > > I like the idea for the case with a single node, as it helps fail
> fast.
> > > > But is it OK to connect a client to a single node only?
> > > >
> > > > For the second one: you mention that a case for the second option is
> > > > "Long-living and mostly idle connections are especially susceptible
> to
> > > this
> > > > behavior". If I understand correctly the connections are removed on
> the
> > > > server side by client idle timeout. Can we just configure the idle
> > > timeout
> > > > for cases where we really need keeping alive idle connections? Are
> > there
> > > > any other cases with unexpectedly dropped connections?
> > > >
> > > > I'm wondering is it OK to keep such connections alive for a long
> time?
> > > > Also in the case of partition awareness features it can lead to
> wasting
> > > TCP
> > > > sockets on Ignite nodes, can't it?
> > > >
> > > > Thanks!
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Feb 3, 2022 at 2:24 PM Pavel Tupitsyn <pt...@apache.org>
> > > > wrote:
> > > >
> > > >> Igniters,
> > > >>
> > > >> Please review the proposal to add heartbeat messages to the thin
> > client
> > > >> protocol (both 2.x and 3.x) and let me know your thoughts:
> > > >>
> > > >>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
> > > >>
> > > >
> > >
> >
> >
> > --
> > Sincerely yours, Ivan Daschinskiy
> >
>


-- 
Sincerely yours, Ivan Daschinskiy

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Pavel Tupitsyn <pt...@apache.org>.
Ivan,

Ideally, the check should come from both sides.
- Client periodically sends keepalive to server
- Server periodically sends keepalive to client

Feature flags will be added accordingly, so it is not necessary to
implement this in all thin clients.

On Mon, Feb 7, 2022 at 11:43 AM Ivan Daschinsky <iv...@gmail.com> wrote:

> I suppose it is great idea, but this functionality can be hard to implement
> for some platforms. I.e. sync python client or php (there is no real
> multithreading for python (GIL) and php is single threaded by design). But
> for async clients it is not very hard to implement. Nevertheless, this
> feature should be optional, because of possible technical limitations.
>
> Pavel, is this check mostly for client side? Or servers can do some actions
> if there is no activity from thin client (i.e. closing context and free
> resources such as queries' handles and so on?)
>
> пн, 7 февр. 2022 г. в 11:09, Pavel Tupitsyn <pt...@apache.org>:
>
> > Hi Maksim,
> >
> >
> > > half-state is a possible situation when an Ignite node goes down or
> > somehow removes connection to a thin client
> >
> > Half-open state is also possible when, for example, an intermediate
> router
> > is rebooted [1].
> >
> > This is what we seem to have encountered with one of our customers - they
> > have a stable cluster, and long-living (multiple days) thin client
> > connections which can be idle for some time.
> > And only when we send some data on such an idle connection do we discover
> > that it is broken.
> >
> >
> > > But with enabled (true by default) partitionAwareness feature clients
> can
> > be notified about topology changes
> >
> > Partition awareness is a "lazy" notification in a form of a response
> > message flag [2].
> > You won't get one on an idle connection.
> >
> >
> > > the connections are removed on the server side by client idle timeout
> >
> > Idle timeout is disabled by default.
> >
> >
> > > is it OK to keep such connections alive for a long time
> >
> > I think it is up to the user.
> >
> >
> > > in the case of partition awareness features it can lead to wasting TCP
> > sockets on Ignite nodes, can't it
> >
> > Can you please elaborate?
> >
> >
> > [1]
> >
> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
> >
> > On Fri, Feb 4, 2022 at 4:01 PM Maksim Timonin <ti...@apache.org>
> > wrote:
> >
> > > Hi Pavel,
> > >
> > > Thanks for starting this thread! Can I ask some questions here to get
> the
> > > feature more clearly?
> > >
> > > As I understand it correctly, half-state is a possible situation when
> an
> > > Ignite node goes down or somehow removes connection to a thin client.
> But
> > > with enabled (true by default) partitionAwareness feature clients can
> be
> > > notified about topology changes. So, there are possible cases:
> > > 1. ThinClient connects to a single node.
> > > 2. Ignite node removes connection from itself.
> > >
> > > I like the idea for the case with a single node, as it helps fail fast.
> > > But is it OK to connect a client to a single node only?
> > >
> > > For the second one: you mention that a case for the second option is
> > > "Long-living and mostly idle connections are especially susceptible to
> > this
> > > behavior". If I understand correctly the connections are removed on the
> > > server side by client idle timeout. Can we just configure the idle
> > timeout
> > > for cases where we really need keeping alive idle connections? Are
> there
> > > any other cases with unexpectedly dropped connections?
> > >
> > > I'm wondering is it OK to keep such connections alive for a long time?
> > > Also in the case of partition awareness features it can lead to wasting
> > TCP
> > > sockets on Ignite nodes, can't it?
> > >
> > > Thanks!
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Feb 3, 2022 at 2:24 PM Pavel Tupitsyn <pt...@apache.org>
> > > wrote:
> > >
> > >> Igniters,
> > >>
> > >> Please review the proposal to add heartbeat messages to the thin
> client
> > >> protocol (both 2.x and 3.x) and let me know your thoughts:
> > >>
> > >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
> > >>
> > >
> >
>
>
> --
> Sincerely yours, Ivan Daschinskiy
>

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Ivan Daschinsky <iv...@gmail.com>.
I suppose it is great idea, but this functionality can be hard to implement
for some platforms. I.e. sync python client or php (there is no real
multithreading for python (GIL) and php is single threaded by design). But
for async clients it is not very hard to implement. Nevertheless, this
feature should be optional, because of possible technical limitations.

Pavel, is this check mostly for client side? Or servers can do some actions
if there is no activity from thin client (i.e. closing context and free
resources such as queries' handles and so on?)

пн, 7 февр. 2022 г. в 11:09, Pavel Tupitsyn <pt...@apache.org>:

> Hi Maksim,
>
>
> > half-state is a possible situation when an Ignite node goes down or
> somehow removes connection to a thin client
>
> Half-open state is also possible when, for example, an intermediate router
> is rebooted [1].
>
> This is what we seem to have encountered with one of our customers - they
> have a stable cluster, and long-living (multiple days) thin client
> connections which can be idle for some time.
> And only when we send some data on such an idle connection do we discover
> that it is broken.
>
>
> > But with enabled (true by default) partitionAwareness feature clients can
> be notified about topology changes
>
> Partition awareness is a "lazy" notification in a form of a response
> message flag [2].
> You won't get one on an idle connection.
>
>
> > the connections are removed on the server side by client idle timeout
>
> Idle timeout is disabled by default.
>
>
> > is it OK to keep such connections alive for a long time
>
> I think it is up to the user.
>
>
> > in the case of partition awareness features it can lead to wasting TCP
> sockets on Ignite nodes, can't it
>
> Can you please elaborate?
>
>
> [1]
> https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
> [2]
>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients
>
> On Fri, Feb 4, 2022 at 4:01 PM Maksim Timonin <ti...@apache.org>
> wrote:
>
> > Hi Pavel,
> >
> > Thanks for starting this thread! Can I ask some questions here to get the
> > feature more clearly?
> >
> > As I understand it correctly, half-state is a possible situation when an
> > Ignite node goes down or somehow removes connection to a thin client. But
> > with enabled (true by default) partitionAwareness feature clients can be
> > notified about topology changes. So, there are possible cases:
> > 1. ThinClient connects to a single node.
> > 2. Ignite node removes connection from itself.
> >
> > I like the idea for the case with a single node, as it helps fail fast.
> > But is it OK to connect a client to a single node only?
> >
> > For the second one: you mention that a case for the second option is
> > "Long-living and mostly idle connections are especially susceptible to
> this
> > behavior". If I understand correctly the connections are removed on the
> > server side by client idle timeout. Can we just configure the idle
> timeout
> > for cases where we really need keeping alive idle connections? Are there
> > any other cases with unexpectedly dropped connections?
> >
> > I'm wondering is it OK to keep such connections alive for a long time?
> > Also in the case of partition awareness features it can lead to wasting
> TCP
> > sockets on Ignite nodes, can't it?
> >
> > Thanks!
> >
> >
> >
> >
> >
> >
> > On Thu, Feb 3, 2022 at 2:24 PM Pavel Tupitsyn <pt...@apache.org>
> > wrote:
> >
> >> Igniters,
> >>
> >> Please review the proposal to add heartbeat messages to the thin client
> >> protocol (both 2.x and 3.x) and let me know your thoughts:
> >>
> >>
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
> >>
> >
>


-- 
Sincerely yours, Ivan Daschinskiy

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Pavel Tupitsyn <pt...@apache.org>.
Hi Maksim,


> half-state is a possible situation when an Ignite node goes down or
somehow removes connection to a thin client

Half-open state is also possible when, for example, an intermediate router
is rebooted [1].

This is what we seem to have encountered with one of our customers - they
have a stable cluster, and long-living (multiple days) thin client
connections which can be idle for some time.
And only when we send some data on such an idle connection do we discover
that it is broken.


> But with enabled (true by default) partitionAwareness feature clients can
be notified about topology changes

Partition awareness is a "lazy" notification in a form of a response
message flag [2].
You won't get one on an idle connection.


> the connections are removed on the server side by client idle timeout

Idle timeout is disabled by default.


> is it OK to keep such connections alive for a long time

I think it is up to the user.


> in the case of partition awareness features it can lead to wasting TCP
sockets on Ignite nodes, can't it

Can you please elaborate?


[1]
https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html
[2]
https://cwiki.apache.org/confluence/display/IGNITE/IEP-23%3A+Best+Effort+Affinity+for+Thin+Clients

On Fri, Feb 4, 2022 at 4:01 PM Maksim Timonin <ti...@apache.org>
wrote:

> Hi Pavel,
>
> Thanks for starting this thread! Can I ask some questions here to get the
> feature more clearly?
>
> As I understand it correctly, half-state is a possible situation when an
> Ignite node goes down or somehow removes connection to a thin client. But
> with enabled (true by default) partitionAwareness feature clients can be
> notified about topology changes. So, there are possible cases:
> 1. ThinClient connects to a single node.
> 2. Ignite node removes connection from itself.
>
> I like the idea for the case with a single node, as it helps fail fast.
> But is it OK to connect a client to a single node only?
>
> For the second one: you mention that a case for the second option is
> "Long-living and mostly idle connections are especially susceptible to this
> behavior". If I understand correctly the connections are removed on the
> server side by client idle timeout. Can we just configure the idle timeout
> for cases where we really need keeping alive idle connections? Are there
> any other cases with unexpectedly dropped connections?
>
> I'm wondering is it OK to keep such connections alive for a long time?
> Also in the case of partition awareness features it can lead to wasting TCP
> sockets on Ignite nodes, can't it?
>
> Thanks!
>
>
>
>
>
>
> On Thu, Feb 3, 2022 at 2:24 PM Pavel Tupitsyn <pt...@apache.org>
> wrote:
>
>> Igniters,
>>
>> Please review the proposal to add heartbeat messages to the thin client
>> protocol (both 2.x and 3.x) and let me know your thoughts:
>>
>>
>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
>>
>

Re: IEP-83 Thin Client Keepalive (heartbeat)

Posted by Maksim Timonin <ti...@apache.org>.
Hi Pavel,

Thanks for starting this thread! Can I ask some questions here to get the
feature more clearly?

As I understand it correctly, half-state is a possible situation when an
Ignite node goes down or somehow removes connection to a thin client. But
with enabled (true by default) partitionAwareness feature clients can be
notified about topology changes. So, there are possible cases:
1. ThinClient connects to a single node.
2. Ignite node removes connection from itself.

I like the idea for the case with a single node, as it helps fail fast. But
is it OK to connect a client to a single node only?

For the second one: you mention that a case for the second option is
"Long-living and mostly idle connections are especially susceptible to this
behavior". If I understand correctly the connections are removed on the
server side by client idle timeout. Can we just configure the idle timeout
for cases where we really need keeping alive idle connections? Are there
any other cases with unexpectedly dropped connections?

I'm wondering is it OK to keep such connections alive for a long time? Also
in the case of partition awareness features it can lead to wasting TCP
sockets on Ignite nodes, can't it?

Thanks!






On Thu, Feb 3, 2022 at 2:24 PM Pavel Tupitsyn <pt...@apache.org> wrote:

> Igniters,
>
> Please review the proposal to add heartbeat messages to the thin client
> protocol (both 2.x and 3.x) and let me know your thoughts:
>
>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-83+Thin+Client+Keepalive
>