You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@qpid.apache.org by Alan Conway <ac...@redhat.com> on 2016/10/17 15:57:49 UTC

Re: API and terms: idle-time-out and heartbeat intervals.

On Fri, 2016-09-30 at 14:40 +0100, Rob Godfrey wrote:
> I don't think it's a bug - it's a completely valid (if chatty)
> choice.
> 
> To my mind the semantics of the field are this:
> 
> The sender of the open frame is saying "if I do not receive any data
> for this period of time, I reserve the right to assume that you are
> no
> longer functioning correctly"
> On receiving this information, the peer should decide on a strategy
> that ensures to the best of its ability, that if it is functioning
> normally it will be generating data sufficiently frequently that the
> condition does not occur.��If this peer knows that it is susceptible
> to delays outside its direct control (such as garbage collection
> pauses) it should take account of this in how often it schedules
> sending heartbeat frames.
> At the same time, the sender of this value should give some leeway
> before actually shutting down to account for unexpected transport
> delays, or processing delays on its side.��Since it is unaware of the
> nature of the allowances made at the peer, it may decide to be
> particularly generous in the leeway it grants.
> 
> I think the spec mentioning a ratio (such as twice) is massively
> unhelpful.
> 
> In terms of the API - what is the applications intent in setting the
> value?��Is the intent to ensure transport activity within a specific
> period, or to set a hard limit on how long the library will wait
> before generating some sort of timeout exception?

My guess is that in terms of the API, the application's intent in
calling "set_idle_timeout(T)" is to set the formally-specified AMQP
value called "idle-timeout" to T. I don't think anybody's first reading
of the API doc or the spec would suggest to them that if they want to
set a wire timeout of T they actually need to call
set_idle_timeout(T*2) on one side of the connection and it will
magically come out as T on the other.

Spec wording aside, does anybody actually disagree with that?


> -- Rob
> 
> 
> On 30 September 2016 at 14:25, Ken Giusti <kg...@redhat.com> wrote:
> > 
> > CC'ing user's list
> > 
> > Given the interpretations of the spec discussed below is the way
> > Proton handles this functionality inconsistent with itself?
> > 
> > 1) Proton sends an open frame with _half_ the value of the timeout
> > as set by the application.
> > 2) Proton interprets the idle-timeout in the received open as the
> > actual time out interval, and pessimistically ;) generates
> > heartbeats as 1/2 that value.
> > 
> > Is this a bug in the Proton implementation?
> > 
> > -K
> > 
> > ----- Original Message -----
> > > 
> > > From: "Alan Conway" <ac...@redhat.com>
> > > To: dev@qpid.apache.org
> > > Sent: Thursday, September 29, 2016 9:11:51 AM
> > > Subject: Re: API and terms: idle-time-out and heartbeat
> > > intervals.
> > > 
> > > On Wed, 2016-09-28 at 22:33 +0100, Robbie Gemmell wrote:
> > > > 
> > > > The spec unfortunately says what it says, and even if there
> > > > were
> > > > scope
> > > > to update it, I'm not sure there is a way to change it to
> > > > address the
> > > > main issue (which regardless of any other clarity issues, is I
> > > > think
> > > > just that it only says you SHOULD advertise half your 'actual
> > > > timeout') that wouldn't result in an 'incompatible change' of
> > > > behaviour.
> > > 
> > > I look at this differently. We agree on the semantics of idle-
> > > time-out
> > > defined by the spec. Now forget the spec wording, it is easy to
> > > explain
> > > clearly what the semantics are. It is not "half your actual time
> > > out",
> > > it is the *exact* frame rate you expect from your peer. The
> > > peer's
> > > *only* task is to respect that frame rate, they do not need any
> > > more
> > > information than that - they certainly don't need to guess at
> > > what your
> > > margin for error is for keeping the connection open.
> > > 
> > > The margin for error is an implementation detail that does *not*
> > > need
> > > to be advertised. Double the frame rate is a reasonable general
> > > recommendation, but the ideal margin depends on latency
> > > variability in
> > > the system, you can imagine systems where it might be better to
> > > have a
> > > larger or a smaller margin.
> > > 
> > > The name 'idle-time-out' is not an ideal choice, 'heartbeat' or
> > > 'max-
> > > frame-delay' might be better. But once we know what it means, the
> > > semantics are perfectly clear.
> > > 
> > > > 
> > > > Knowing the exact enforced timeout could certainly be clearer
> > > > in some
> > > > ways, but if the period you had to send after wasnt defined
> > > > (e.g
> > > > half)
> > > > then it would essentially leave you with the same problem as
> > > > now:
> > > > deciding exactly how early you should send heartbeat frames to
> > > > avoid
> > > > a
> > > > [more-precisely-known] timeout.
> > > 
> > > The only important thing is to agree on the meaning of the
> > > advertised
> > > value: is it the frame rate or the connection close threshold?
> > > The work
> > > �of adjusting at one end or the other is equivalent.
> > > 
> > > I think we agree it is currently defined as the frame rate, I see
> > > no
> > > benefit in trying to change the meaning. (There would be no
> > > drawback
> > > either if the spec was not already published and implemented by
> > > multiple implementors, but it is)
> > > 
> > > > 
> > > > 
> > > > With what the spec says we effectively have a lower-bound on
> > > > the
> > > > actual-timeout (if cases the peer didnt half their actual-
> > > > timeout
> > > > before advertising a number) and an upper bound (if they did)
> > > > due to
> > > > the use of SHOULD as opposed to MUST or some other more fixed
> > > > definition of behaviour. So we currently try to send frames
> > > > often
> > > > enough to satisfy that lower bound, i.e the number we received.
> > > > That
> > > > essentially reduces it to the the same problem as if we were
> > > > trying
> > > > to
> > > > satisfy a theoretical exactly-known actual-timeout (just using
> > > > a
> > > > smaller number that we dont want to use, because we followed
> > > > the
> > > > specs
> > > > recommendation). Currently we do this by sending frames after
> > > > half
> > > > the
> > > > advertised period, i.e. what we know is actually a quarter of
> > > > the
> > > > actual-timeout enforced by peers such as proton which are
> > > > following
> > > > the specs recommendations. We could choose to do it less often
> > > > than
> > > > that by reducing how conservative it is, e.g use 75+% of
> > > > advertised
> > > > value instead for example, which should have no impact in the
> > > > cases
> > > > where peers had followed the recommendation (as we would still
> > > > be
> > > > sending more than twice as quickly as needed to satisfy their
> > > > actual-timeout), but would put us closer to spurious timeout (
> > > > if e.g
> > > > there was a delay in delivering/processing the heartbeat)
> > > > against any
> > > > peers that didn't follow the recommendation.
> > > > 
> > > > On 28 September 2016 at 21:26, Justin Ross <justin.ross@gmail.c
> > > > om>
> > > > wrote:
> > > > > 
> > > > > 
> > > > > IMO, the overall picture is simpler, and easier to explain to
> > > > > third
> > > > > parties, if we go the way Ken suggested.��When a remote peer
> > > > > sends
> > > > > you an
> > > > > idle timeout value, it is an expression of an (actual, not
> > > > > simply
> > > > > "advertised") guarantee - "I will expire the connection after
> > > > > X
> > > > > time
> > > > > without receiving a frame from you".
> > > > > 
> > > > > We could also legitimately go the direction you
> > > > > suggest.��*But* its
> > > > > name is
> > > > > "idle timeout".��We can't easily change the name.��I think we
> > > > > should take
> > > > > the spec text that goes with the name, and the behavior of
> > > > > our
> > > > > components,
> > > > > firmly in the direction Ken suggests.
> > > > > 
> > > > > Off topic: why is this on the dev list?
> > > > > 
> > > > > 
> > > > > On Wed, Sep 28, 2016 at 12:36 PM, Alan Conway <aconway@redhat
> > > > > .com>
> > > > > wrote:
> > > > > 
> > > > > > 
> > > > > > 
> > > > > > On Wed, 2016-09-28 at 10:13 -0400, Ken Giusti wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > I've had a hand in the way Proton/C interprets the
> > > > > > > meaning of
> > > > > > > 'idle-
> > > > > > > timeout' and I've never liked the solution.��I think
> > > > > > > Proton/C's
> > > > > > > behavior is not 'pessimistic' as much as it is
> > > > > > > 'conservative'
> > > > > > > for the
> > > > > > > sake of interoperability.��This, unfortunately ends up
> > > > > > > with a
> > > > > > > needless idle frame chattiness when both ends are Proton-
> > > > > > > based.
> > > > > > > 
> > > > > > > ----- Original Message -----
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > From: "Rob Godfrey" <ro...@gmail.com>
> > > > > > > > To: "qpid" <de...@qpid.apache.org>
> > > > > > > > Sent: Wednesday, September 28, 2016 6:19:05 AM
> > > > > > > > Subject: Re: API and terms: idle-time-out and heartbeat
> > > > > > > > intervals.
> > > > > > > > 
> > > > > > > > I agree that specifying that the communicated figure
> > > > > > > > should
> > > > > > > > be
> > > > > > > > "half"
> > > > > > > > the "actual" timeout was a mistake.
> > > > > > > > 
> > > > > > > > What the spec should have tried to communicate is that
> > > > > > > > the
> > > > > > > > sender
> > > > > > > > should communicate a value somewhat less than the
> > > > > > > > period it
> > > > > > > > uses to
> > > > > > > > determine that the connection has actually timed-out to
> > > > > > > > allow
> > > > > > > > for
> > > > > > > > the
> > > > > > > > receiver to process and emit a heartbeat frame.
> > > > > > > 
> > > > > > > 
> > > > > > > Wouldn't it be much clearer to simply send the _actual_
> > > > > > > idle
> > > > > > > timeout
> > > > > > > value?
> > > > > > 
> > > > > > My read is that is exactly what it does: It sends the max
> > > > > > time
> > > > > > that the
> > > > > > *sender* of frames may be idle. The receiver of frames
> > > > > > SHOULD be
> > > > > > more
> > > > > > patient than that. The wording of the "discussion" around
> > > > > > it and
> > > > > > the
> > > > > > choice of terms is a bit cloudy but, the text that
> > > > > > describes
> > > > > > idle-time-
> > > > > > out seems clear enough: it is the max interval between
> > > > > > sending
> > > > > > frames.
> > > > > > The frame receiver SHOULD wait longer that that before
> > > > > > closing,
> > > > > > and 2x
> > > > > > seems a reasonable suggestion, but that's for the impl to
> > > > > > decide.
> > > > > > 
> > > > > > It's weird that it says "idle-time-out should be half the
> > > > > > threshold"
> > > > > > instead of "the threshold should be twice the idle-time-
> > > > > > out" but
> > > > > > it's
> > > > > > logically equivalent.
> > > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > Having the spec suggest "communicating a value *somewhat
> > > > > > > less*"
> > > > > > 
> > > > > > The wording is odd but the semantics are you communicate
> > > > > > *exactly* the
> > > > > > max frame delay you want and then you SHOULD set your
> > > > > > connection
> > > > > > close
> > > > > > threshold to something bigger. The other end doesn't need
> > > > > > to know
> > > > > > how
> > > > > > much bigger, they just need to know what rate to send
> > > > > > frames.
> > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > [emphasis mine] leaves the implementation open for
> > > > > > > interpretation -
> > > > > > > which is exactly how we got into this mess in the first
> > > > > > > place.��Developers are a smart bunch - they know that
> > > > > > > keep
> > > > > > > alive
> > > > > > > traffic will have to be sent frequently enough to prevent
> > > > > > > idle
> > > > > > > timeout.
> > > > > > > 
> > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > �Similarly the sender
> > > > > > > > should ensure that a frame has been emitted well within
> > > > > > > > the
> > > > > > > > timeout
> > > > > > > > period to allow for any communication / processing
> > > > > > > > delay.
> > > > > > > 
> > > > > > > Agreed - perfectly acceptable for the spec to point this
> > > > > > > out.
> > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > �In practice
> > > > > > > > these "wiggle room" factors should not be determined by
> > > > > > > > the
> > > > > > > > application level timeout setting but by sensible
> > > > > > > > calculations on
> > > > > > > > transport delay variance / processing time,
> > > > > > > > etc...��these
> > > > > > > > calculation
> > > > > > > > may differ between different use-cases / environments
> > > > > > > > (for
> > > > > > > > example
> > > > > > > > in
> > > > > > > > a low latency / real-time environment you may be able
> > > > > > > > to make
> > > > > > > > hard
> > > > > > > > guarantees about the number of milliseconds that
> > > > > > > > communication /
> > > > > > > > processing delay will take... on the other hand if you
> > > > > > > > are
> > > > > > > > using an
> > > > > > > > interpreted language with stop-the-world garbage
> > > > > > > > collection
> > > > > > > > you may
> > > > > > > > not be able to say much better than the delay should be
> > > > > > > > less
> > > > > > > > than
> > > > > > > > 30s
> > > > > > > > or whatever).
> > > > > > > > 
> > > > > > > 
> > > > > > > Yes - very important things to keep in mind when
> > > > > > > implementing
> > > > > > > this.��But the spec shouldn't be making these suggestions
> > > > > > > for
> > > > > > > different implementation options. The spec should be as
> > > > > > > concise
> > > > > > > as
> > > > > > > possible about the mandated behavior, and leave the
> > > > > > > implementation to
> > > > > > > the developers.
> > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > I think application level APIs should be in terms of
> > > > > > > > the
> > > > > > > > timeouts
> > > > > > > > that
> > > > > > > > will affect the application.��The AMQP library should
> > > > > > > > be
> > > > > > > > massaging
> > > > > > > > those numbers in such a way that they can fulfil the
> > > > > > > > application
> > > > > > > > requirements.
> > > > > > > > 
> > > > > > > 
> > > > > > > Agreed.��Now, is there _any_ way we can suggest an update
> > > > > > > to
> > > > > > > the
> > > > > > > spec?��Perhaps an errata, etc?
> > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > -- Rob
> > > > > > > > 
> > > > > > > > On 28 September 2016 at 10:42, Robbie Gemmell
> > > > > > > > <robbie.gemmell
> > > > > > > > @gmail
> > > > > > > > .com>
> > > > > > > > wrote:
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > On 27 September 2016 at 22:24, Alan Conway <aconway@r
> > > > > > > > > edhat.
> > > > > > > > > com>
> > > > > > > > > wrote:
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > On Tue, 2016-09-27 at 15:37 -0400, Alan Conway
> > > > > > > > > > wrote:
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > I want to clarify and document the meaning of
> > > > > > > > > > > these
> > > > > > > > > > > terms for
> > > > > > > > > > > our
> > > > > > > > > > > APIs,
> > > > > > > > > > > presently I can't find anywhere where they are
> > > > > > > > > > > documented
> > > > > > > > > > > clearly.
> > > > > > > > > > > 
> > > > > > > > > > > The AMQP spec says: "Each peer has its own
> > > > > > > > > > > (independent) idle
> > > > > > > > > > > timeout.
> > > > > > > > > > > At connection open each peer communicates the
> > > > > > > > > > > maximum
> > > > > > > > > > > period between activity (frames) on the
> > > > > > > > > > > connection that
> > > > > > > > > > > it
> > > > > > > > > > > desires
> > > > > > > > > > > from
> > > > > > > > > > > its partner.The open frame carries the idletime-
> > > > > > > > > > > out
> > > > > > > > > > > field for this purpose. To avoid spurious
> > > > > > > > > > > timeouts, the
> > > > > > > > > > > value
> > > > > > > > > > > in
> > > > > > > > > > > idle-
> > > > > > > > > > > time-out SHOULD be half the peer\u2019s
> > > > > > > > > > > actual timeout threshold."
> > > > > > > > > > > 
> > > > > > > > > > > In other words: if I send you an "open" frame
> > > > > > > > > > > with
> > > > > > > > > > > idle-time-
> > > > > > > > > > > out=N
> > > > > > > > > > > that
> > > > > > > > > > > means *you* should not wait for longer than N
> > > > > > > > > > > milliseconds to
> > > > > > > > > > > send a
> > > > > > > > > > > frame to me. It does not mean *I* will close the
> > > > > > > > > > > connection
> > > > > > > > > > > after N
> > > > > > > > > > > milliseconds, I SHOULD be more patient and wait
> > > > > > > > > > > for N*2
> > > > > > > > > > > ms to
> > > > > > > > > > > avoid
> > > > > > > > > > > closing prematurely due to minor timing wobbles.
> > > > > > > > > > > 
> > > > > > > > > > > I think the choice of name is slightly ambiguous
> > > > > > > > > > > but
> > > > > > > > > > > the spec
> > > > > > > > > > > is
> > > > > > > > > > > clear
> > > > > > > > > > > on the semantics, so it's important to document
> > > > > > > > > > > it to
> > > > > > > > > > > remove
> > > > > > > > > > > the
> > > > > > > > > > > ambiguity.
> > > > > > > > > > > 
> > > > > > > > > > > Anybody disagree?
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > Sigh. Sadly proton-C interprets "idle-timeout"
> > > > > > > > > > differently
> > > > > > > > > > depending on
> > > > > > > > > > which end of the connection you are on:
> > > > > > > > > > 
> > > > > > > > > > ������// as per the recommendation in the spec,
> > > > > > > > > > advertise
> > > > > > > > > > half
> > > > > > > > > > our
> > > > > > > > > > ������// actual timeout to the remote
> > > > > > > > > > ������const pn_millis_t idle_timeout = transport-
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > local_idle_timeout
> > > > > > > > > > ����������? (transport->local_idle_timeout/2)
> > > > > > > > > > ����������: 0;
> > > > > > > > > > 
> > > > > > > > > > So in proton, pn_set_idle_timeout does NOT mean set
> > > > > > > > > > the
> > > > > > > > > > AMQP
> > > > > > > > > > idle-
> > > > > > > > > > timeout value, it means set the local "receive
> > > > > > > > > > timeout"
> > > > > > > > > > value
> > > > > > > > > > and send
> > > > > > > > > > half that as the AMQP "send timeout" for the peer.
> > > > > > > > > > 
> > > > > > > > > > I'm tempted to use a new term in the Go API:
> > > > > > > > > > "heartbeat".
> > > > > > > > > > To me
> > > > > > > > > > that
> > > > > > > > > > clearly means the "send timeout" (hearts beat, they
> > > > > > > > > > don't
> > > > > > > > > > listen for
> > > > > > > > > > beats) so it coincides with the meaning of the AMQP
> > > > > > > > > > "idle-
> > > > > > > > > > timeout", but
> > > > > > > > > > without the ambiguity that is exacerbated by proton
> > > > > > > > > > interpreting it
> > > > > > > > > > both ways.
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Proton may seem to behave differently on each end,
> > > > > > > > > but I
> > > > > > > > > don't
> > > > > > > > > think
> > > > > > > > > its necessarily a bad thing that it does, and it is
> > > > > > > > > also I
> > > > > > > > > think
> > > > > > > > > largely just reflecting an annoying bit in the spec
> > > > > > > > > around
> > > > > > > > > this
> > > > > > > > > where
> > > > > > > > > different behaviours are allowed for, whereas it
> > > > > > > > > would be
> > > > > > > > > easier
> > > > > > > > > if it
> > > > > > > > > had less wiggle room.
> > > > > > > > > 
> > > > > > > > > The transport setter/getter for the local timeout
> > > > > > > > > takes the
> > > > > > > > > 'actual
> > > > > > > > > timeout' and then sends half of it as the advertised
> > > > > > > > > value
> > > > > > > > > in the
> > > > > > > > > Open
> > > > > > > > > sent. This makes a certain amount of sense since it
> > > > > > > > > ensures
> > > > > > > > > that
> > > > > > > > > appropriate behaviour is actually satisfied, rather
> > > > > > > > > than
> > > > > > > > > expecting the
> > > > > > > > > user to ensure they only give half the value they
> > > > > > > > > really
> > > > > > > > > want for
> > > > > > > > > their actual timeout. The getter for the remote
> > > > > > > > > timeout
> > > > > > > > > value on
> > > > > > > > > the
> > > > > > > > > other hand returns the advertised value from the Open
> > > > > > > > > that
> > > > > > > > > is
> > > > > > > > > received. I expect it does that since it cant
> > > > > > > > > actually ever
> > > > > > > > > return the
> > > > > > > > > remotes 'actual timeout' without making an
> > > > > > > > > assumption, i.e
> > > > > > > > > that
> > > > > > > > > they
> > > > > > > > > did in fact advertise half (or less) of their actual
> > > > > > > > > timeout,
> > > > > > > > > which
> > > > > > > > > the spec only says that they SHOULD do.
> > > > > > > > > 
> > > > > > > > > Yes the local setter taking the advertised value may
> > > > > > > > > have
> > > > > > > > > been
> > > > > > > > > better
> > > > > > > > > for method consistency with the remote getter. On the
> > > > > > > > > other
> > > > > > > > > hand,
> > > > > > > > > sending of necessary heartbeats is handled directly
> > > > > > > > > by the
> > > > > > > > > transport
> > > > > > > > > during the tick process, so users may not necessarily
> > > > > > > > > even
> > > > > > > > > use
> > > > > > > > > the
> > > > > > > > > getter themselves, and proton uses that remote value
> > > > > > > > > internally
> > > > > > > > > by
> > > > > > > > > pessimistically halfing it to account for the case
> > > > > > > > > that
> > > > > > > > > folks on
> > > > > > > > > the
> > > > > > > > > other end did not advertise half their actual timeout
> > > > > > > > > (since the
> > > > > > > > > spec
> > > > > > > > > doesnt require that they do). Side note: proton could
> > > > > > > > > arguably be
> > > > > > > > > less
> > > > > > > > > pessimistic here and go for say a percentage much
> > > > > > > > > nearer
> > > > > > > > > the full
> > > > > > > > > advertised value, but then you'd probably need to
> > > > > > > > > start
> > > > > > > > > guaging
> > > > > > > > > how
> > > > > > > > > close is too close.
> > > > > > > > > 
> > > > > > > > > I think ensuring the doccumentation on the methods is
> > > > > > > > > clear
> > > > > > > > > what
> > > > > > > > > they
> > > > > > > > > do is sufficient enough here. I actually prefer idle-
> > > > > > > > > timeout as
> > > > > > > > > an
> > > > > > > > > name rather than heartbeat due to the way this all
> > > > > > > > > works.
> > > > > > > > > Since
> > > > > > > > > you
> > > > > > > > > only tell the other side [half] your timeout, you
> > > > > > > > > dont
> > > > > > > > > actually
> > > > > > > > > have
> > > > > > > > > direct control over when they send any needed empty
> > > > > > > > > frames
> > > > > > > > > to
> > > > > > > > > satisfy
> > > > > > > > > it (as the above shows, we might send them more often
> > > > > > > > > than
> > > > > > > > > they
> > > > > > > > > require) and 'heartbeat' might seem to imply that you
> > > > > > > > > do,
> > > > > > > > > and
> > > > > > > > > possibly
> > > > > > > > > even that they need be sent at that period all the
> > > > > > > > > time
> > > > > > > > > even
> > > > > > > > > despite
> > > > > > > > > regular traffic, which is not the case.
> > > > > > > > > 
> > > > > > > > > Robbie
> > > > > > > > > 
> > > > > > > > > ---------------------------------------------------
> > > > > > > > > ------
> > > > > > > > > ------
> > > > > > > > > ------
> > > > > > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.o
> > > > > > > > > rg
> > > > > > > > > For additional commands, e-mail: dev-help@qpid.apache
> > > > > > > > > .org
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > -----------------------------------------------------
> > > > > > > > ------
> > > > > > > > ------
> > > > > > > > ----
> > > > > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> > > > > > > > For additional commands, e-mail: dev-help@qpid.apache.o
> > > > > > > > rg
> > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > > 
> > > > > > ---------------------------------------------------------
> > > > > > ------
> > > > > > ------
> > > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> > > > > > For additional commands, e-mail: dev-help@qpid.apache.org
> > > > > > 
> > > > > > 
> > > > 
> > > > -------------------------------------------------------------
> > > > --------
> > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> > > > For additional commands, e-mail: dev-help@qpid.apache.org
> > > > 
> > > 
> > > 
> > > ---------------------------------------------------------------
> > > ------
> > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> > > For additional commands, e-mail: dev-help@qpid.apache.org
> > > 
> > > 
> > 
> > --
> > -K
> > 
> > -----------------------------------------------------------------
> > ----
> > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> > For additional commands, e-mail: users-help@qpid.apache.org
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

Re: API and terms: idle-time-out and heartbeat intervals.

Posted by Rob Godfrey <ro...@gmail.com>.

OK - that (with the above documentation) seems fine to me - I like the use
of Heartbeat to properly from the notion of the timeout threshold.

-- Rob

On 17 October 2016 at 21:09, Alan Conway <ac...@redhat.com> wrote:

> On Mon, 2016-10-17 at 17:08 +0100, Rob Godfrey wrote:
> > If this API is not also making any decision on the (local) "timeout
> > threshold" then that seems eminently reasonable (and what anyone with
> > a
> > passing familiarity of the spec would expect).
> >
> > If the API is using the same call to also determine when the library
> > will
> > actually consider the peer has ceased to function, then I would be
> > more
> > concerned (albeit in an abstract way as I'm not actually currently
> > using
> > the API at all :-) ).
> >
>
> Here's what I am considering for Go:
>
> Called by the end initiating the connection:
>
> // Heartbeat returns a ConnectionOption that requests the maximum delay
> // between sending frames for the remote peer. If we don't receive any
> frames
> // within 2*delay we will close the connection.
> //
> func Heartbeat(delay time.Duration) ConnectionOption {
>         // Proton-C divides the idle-timeout by 2 before sending, so
> compensate.
>         return func(c *connection) { c.engine.Transport().SetIdleTimeout(2
> * delay) }
> }
>
> Called to query an incoming connection:
>
>         // Heartbeat is the maximum delay between sending frames that the
> remote peer
>         // has requested of us. If the interval expires an empty
> "heartbeat" frame
>         // will be sent automatically to keep the connection open.
>         Heartbeat() time.Duration
>
> Key point is there is just *one* number that is passed to the API, sent
> on the wire, and available to the other end, and it has a clearly
> described meaning. The built-in automatic behavior is also described.
>
> IMO using "idle timeout" to mean one thing at one end of the
> connection, and a different thing at the other (and wait, which one
> goes on the wire again?) is insane.
>
> > -- Rob
> >
> > On 17 October 2016 at 16:57, Alan Conway <ac...@redhat.com> wrote:
> >
> > >
> > > On Fri, 2016-09-30 at 14:40 +0100, Rob Godfrey wrote:
> > > >
> > > > I don't think it's a bug - it's a completely valid (if chatty)
> > > > choice.
> > > >
> > > > To my mind the semantics of the field are this:
> > > >
> > > > The sender of the open frame is saying "if I do not receive any
> > > > data
> > > > for this period of time, I reserve the right to assume that you
> > > > are
> > > > no
> > > > longer functioning correctly"
> > > > On receiving this information, the peer should decide on a
> > > > strategy
> > > > that ensures to the best of its ability, that if it is
> > > > functioning
> > > > normally it will be generating data sufficiently frequently that
> > > > the
> > > > condition does not occur.  If this peer knows that it is
> > > > susceptible
> > > > to delays outside its direct control (such as garbage collection
> > > > pauses) it should take account of this in how often it schedules
> > > > sending heartbeat frames.
> > > > At the same time, the sender of this value should give some
> > > > leeway
> > > > before actually shutting down to account for unexpected transport
> > > > delays, or processing delays on its side.  Since it is unaware of
> > > > the
> > > > nature of the allowances made at the peer, it may decide to be
> > > > particularly generous in the leeway it grants.
> > > >
> > > > I think the spec mentioning a ratio (such as twice) is massively
> > > > unhelpful.
> > > >
> > > > In terms of the API - what is the applications intent in setting
> > > > the
> > > > value?  Is the intent to ensure transport activity within a
> > > > specific
> > > > period, or to set a hard limit on how long the library will wait
> > > > before generating some sort of timeout exception?
> > >
> > > My guess is that in terms of the API, the application's intent in
> > > calling "set_idle_timeout(T)" is to set the formally-specified AMQP
> > > value called "idle-timeout" to T. I don't think anybody's first
> > > reading
> > > of the API doc or the spec would suggest to them that if they want
> > > to
> > > set a wire timeout of T they actually need to call
> > > set_idle_timeout(T*2) on one side of the connection and it will
> > > magically come out as T on the other.
> > >
> > > Spec wording aside, does anybody actually disagree with that?
> > >
> > >
> > > >
> > > > -- Rob
> > > >
> > > >
> > > > On 30 September 2016 at 14:25, Ken Giusti <kg...@redhat.com>
> > > > wrote:
> > > > >
> > > > >
> > > > > CC'ing user's list
> > > > >
> > > > > Given the interpretations of the spec discussed below is the
> > > > > way
> > > > > Proton handles this functionality inconsistent with itself?
> > > > >
> > > > > 1) Proton sends an open frame with _half_ the value of the
> > > > > timeout
> > > > > as set by the application.
> > > > > 2) Proton interprets the idle-timeout in the received open as
> > > > > the
> > > > > actual time out interval, and pessimistically ;) generates
> > > > > heartbeats as 1/2 that value.
> > > > >
> > > > > Is this a bug in the Proton implementation?
> > > > >
> > > > > -K
> > > > >
> > > > > ----- Original Message -----
> > > > > >
> > > > > >
> > > > > > From: "Alan Conway" <ac...@redhat.com>
> > > > > > To: dev@qpid.apache.org
> > > > > > Sent: Thursday, September 29, 2016 9:11:51 AM
> > > > > > Subject: Re: API and terms: idle-time-out and heartbeat
> > > > > > intervals.
> > > > > >
> > > > > > On Wed, 2016-09-28 at 22:33 +0100, Robbie Gemmell wrote:
> > > > > > >
> > > > > > >
> > > > > > > The spec unfortunately says what it says, and even if there
> > > > > > > were
> > > > > > > scope
> > > > > > > to update it, I'm not sure there is a way to change it to
> > > > > > > address the
> > > > > > > main issue (which regardless of any other clarity issues,
> > > > > > > is I
> > > > > > > think
> > > > > > > just that it only says you SHOULD advertise half your
> > > > > > > 'actual
> > > > > > > timeout') that wouldn't result in an 'incompatible change'
> > > > > > > of
> > > > > > > behaviour.
> > > > > >
> > > > > > I look at this differently. We agree on the semantics of
> > > > > > idle-
> > > > > > time-out
> > > > > > defined by the spec. Now forget the spec wording, it is easy
> > > > > > to
> > > > > > explain
> > > > > > clearly what the semantics are. It is not "half your actual
> > > > > > time
> > > > > > out",
> > > > > > it is the *exact* frame rate you expect from your peer. The
> > > > > > peer's
> > > > > > *only* task is to respect that frame rate, they do not need
> > > > > > any
> > > > > > more
> > > > > > information than that - they certainly don't need to guess at
> > > > > > what your
> > > > > > margin for error is for keeping the connection open.
> > > > > >
> > > > > > The margin for error is an implementation detail that does
> > > > > > *not*
> > > > > > need
> > > > > > to be advertised. Double the frame rate is a reasonable
> > > > > > general
> > > > > > recommendation, but the ideal margin depends on latency
> > > > > > variability in
> > > > > > the system, you can imagine systems where it might be better
> > > > > > to
> > > > > > have a
> > > > > > larger or a smaller margin.
> > > > > >
> > > > > > The name 'idle-time-out' is not an ideal choice, 'heartbeat'
> > > > > > or
> > > > > > 'max-
> > > > > > frame-delay' might be better. But once we know what it means,
> > > > > > the
> > > > > > semantics are perfectly clear.
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Knowing the exact enforced timeout could certainly be
> > > > > > > clearer
> > > > > > > in some
> > > > > > > ways, but if the period you had to send after wasnt defined
> > > > > > > (e.g
> > > > > > > half)
> > > > > > > then it would essentially leave you with the same problem
> > > > > > > as
> > > > > > > now:
> > > > > > > deciding exactly how early you should send heartbeat frames
> > > > > > > to
> > > > > > > avoid
> > > > > > > a
> > > > > > > [more-precisely-known] timeout.
> > > > > >
> > > > > > The only important thing is to agree on the meaning of the
> > > > > > advertised
> > > > > > value: is it the frame rate or the connection close
> > > > > > threshold?
> > > > > > The work
> > > > > >  of adjusting at one end or the other is equivalent.
> > > > > >
> > > > > > I think we agree it is currently defined as the frame rate, I
> > > > > > see
> > > > > > no
> > > > > > benefit in trying to change the meaning. (There would be no
> > > > > > drawback
> > > > > > either if the spec was not already published and implemented
> > > > > > by
> > > > > > multiple implementors, but it is)
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > With what the spec says we effectively have a lower-bound
> > > > > > > on
> > > > > > > the
> > > > > > > actual-timeout (if cases the peer didnt half their actual-
> > > > > > > timeout
> > > > > > > before advertising a number) and an upper bound (if they
> > > > > > > did)
> > > > > > > due to
> > > > > > > the use of SHOULD as opposed to MUST or some other more
> > > > > > > fixed
> > > > > > > definition of behaviour. So we currently try to send frames
> > > > > > > often
> > > > > > > enough to satisfy that lower bound, i.e the number we
> > > > > > > received.
> > > > > > > That
> > > > > > > essentially reduces it to the the same problem as if we
> > > > > > > were
> > > > > > > trying
> > > > > > > to
> > > > > > > satisfy a theoretical exactly-known actual-timeout (just
> > > > > > > using
> > > > > > > a
> > > > > > > smaller number that we dont want to use, because we
> > > > > > > followed
> > > > > > > the
> > > > > > > specs
> > > > > > > recommendation). Currently we do this by sending frames
> > > > > > > after
> > > > > > > half
> > > > > > > the
> > > > > > > advertised period, i.e. what we know is actually a quarter
> > > > > > > of
> > > > > > > the
> > > > > > > actual-timeout enforced by peers such as proton which are
> > > > > > > following
> > > > > > > the specs recommendations. We could choose to do it less
> > > > > > > often
> > > > > > > than
> > > > > > > that by reducing how conservative it is, e.g use 75+% of
> > > > > > > advertised
> > > > > > > value instead for example, which should have no impact in
> > > > > > > the
> > > > > > > cases
> > > > > > > where peers had followed the recommendation (as we would
> > > > > > > still
> > > > > > > be
> > > > > > > sending more than twice as quickly as needed to satisfy
> > > > > > > their
> > > > > > > actual-timeout), but would put us closer to spurious
> > > > > > > timeout (
> > > > > > > if e.g
> > > > > > > there was a delay in delivering/processing the heartbeat)
> > > > > > > against any
> > > > > > > peers that didn't follow the recommendation.
> > > > > > >
> > > > > > > On 28 September 2016 at 21:26, Justin Ross <justin.ross@gma
> > > > > > > il.c
> > > > > > > om>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > IMO, the overall picture is simpler, and easier to
> > > > > > > > explain to
> > > > > > > > third
> > > > > > > > parties, if we go the way Ken suggested.  When a remote
> > > > > > > > peer
> > > > > > > > sends
> > > > > > > > you an
> > > > > > > > idle timeout value, it is an expression of an (actual,
> > > > > > > > not
> > > > > > > > simply
> > > > > > > > "advertised") guarantee - "I will expire the connection
> > > > > > > > after
> > > > > > > > X
> > > > > > > > time
> > > > > > > > without receiving a frame from you".
> > > > > > > >
> > > > > > > > We could also legitimately go the direction you
> > > > > > > > suggest.  *But* its
> > > > > > > > name is
> > > > > > > > "idle timeout".  We can't easily change the name.  I
> > > > > > > > think we
> > > > > > > > should take
> > > > > > > > the spec text that goes with the name, and the behavior
> > > > > > > > of
> > > > > > > > our
> > > > > > > > components,
> > > > > > > > firmly in the direction Ken suggests.
> > > > > > > >
> > > > > > > > Off topic: why is this on the dev list?
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Sep 28, 2016 at 12:36 PM, Alan Conway <aconway@re
> > > > > > > > dhat
> > > > > > > > .com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, 2016-09-28 at 10:13 -0400, Ken Giusti wrote:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I've had a hand in the way Proton/C interprets the
> > > > > > > > > > meaning of
> > > > > > > > > > 'idle-
> > > > > > > > > > timeout' and I've never liked the solution.  I think
> > > > > > > > > > Proton/C's
> > > > > > > > > > behavior is not 'pessimistic' as much as it is
> > > > > > > > > > 'conservative'
> > > > > > > > > > for the
> > > > > > > > > > sake of interoperability.  This, unfortunately ends
> > > > > > > > > > up
> > > > > > > > > > with a
> > > > > > > > > > needless idle frame chattiness when both ends are
> > > > > > > > > > Proton-
> > > > > > > > > > based.
> > > > > > > > > >
> > > > > > > > > > ----- Original Message -----
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > From: "Rob Godfrey" <ro...@gmail.com>
> > > > > > > > > > > To: "qpid" <de...@qpid.apache.org>
> > > > > > > > > > > Sent: Wednesday, September 28, 2016 6:19:05 AM
> > > > > > > > > > > Subject: Re: API and terms: idle-time-out and
> > > > > > > > > > > heartbeat
> > > > > > > > > > > intervals.
> > > > > > > > > > >
> > > > > > > > > > > I agree that specifying that the communicated
> > > > > > > > > > > figure
> > > > > > > > > > > should
> > > > > > > > > > > be
> > > > > > > > > > > "half"
> > > > > > > > > > > the "actual" timeout was a mistake.
> > > > > > > > > > >
> > > > > > > > > > > What the spec should have tried to communicate is
> > > > > > > > > > > that
> > > > > > > > > > > the
> > > > > > > > > > > sender
> > > > > > > > > > > should communicate a value somewhat less than the
> > > > > > > > > > > period it
> > > > > > > > > > > uses to
> > > > > > > > > > > determine that the connection has actually timed-
> > > > > > > > > > > out to
> > > > > > > > > > > allow
> > > > > > > > > > > for
> > > > > > > > > > > the
> > > > > > > > > > > receiver to process and emit a heartbeat frame.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Wouldn't it be much clearer to simply send the
> > > > > > > > > > _actual_
> > > > > > > > > > idle
> > > > > > > > > > timeout
> > > > > > > > > > value?
> > > > > > > > >
> > > > > > > > > My read is that is exactly what it does: It sends the
> > > > > > > > > max
> > > > > > > > > time
> > > > > > > > > that the
> > > > > > > > > *sender* of frames may be idle. The receiver of frames
> > > > > > > > > SHOULD be
> > > > > > > > > more
> > > > > > > > > patient than that. The wording of the "discussion"
> > > > > > > > > around
> > > > > > > > > it and
> > > > > > > > > the
> > > > > > > > > choice of terms is a bit cloudy but, the text that
> > > > > > > > > describes
> > > > > > > > > idle-time-
> > > > > > > > > out seems clear enough: it is the max interval between
> > > > > > > > > sending
> > > > > > > > > frames.
> > > > > > > > > The frame receiver SHOULD wait longer that that before
> > > > > > > > > closing,
> > > > > > > > > and 2x
> > > > > > > > > seems a reasonable suggestion, but that's for the impl
> > > > > > > > > to
> > > > > > > > > decide.
> > > > > > > > >
> > > > > > > > > It's weird that it says "idle-time-out should be half
> > > > > > > > > the
> > > > > > > > > threshold"
> > > > > > > > > instead of "the threshold should be twice the idle-
> > > > > > > > > time-
> > > > > > > > > out" but
> > > > > > > > > it's
> > > > > > > > > logically equivalent.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Having the spec suggest "communicating a value
> > > > > > > > > > *somewhat
> > > > > > > > > > less*"
> > > > > > > > >
> > > > > > > > > The wording is odd but the semantics are you
> > > > > > > > > communicate
> > > > > > > > > *exactly* the
> > > > > > > > > max frame delay you want and then you SHOULD set your
> > > > > > > > > connection
> > > > > > > > > close
> > > > > > > > > threshold to something bigger. The other end doesn't
> > > > > > > > > need
> > > > > > > > > to know
> > > > > > > > > how
> > > > > > > > > much bigger, they just need to know what rate to send
> > > > > > > > > frames.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > [emphasis mine] leaves the implementation open for
> > > > > > > > > > interpretation -
> > > > > > > > > > which is exactly how we got into this mess in the
> > > > > > > > > > first
> > > > > > > > > > place.  Developers are a smart bunch - they know that
> > > > > > > > > > keep
> > > > > > > > > > alive
> > > > > > > > > > traffic will have to be sent frequently enough to
> > > > > > > > > > prevent
> > > > > > > > > > idle
> > > > > > > > > > timeout.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >  Similarly the sender
> > > > > > > > > > > should ensure that a frame has been emitted well
> > > > > > > > > > > within
> > > > > > > > > > > the
> > > > > > > > > > > timeout
> > > > > > > > > > > period to allow for any communication / processing
> > > > > > > > > > > delay.
> > > > > > > > > >
> > > > > > > > > > Agreed - perfectly acceptable for the spec to point
> > > > > > > > > > this
> > > > > > > > > > out.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >  In practice
> > > > > > > > > > > these "wiggle room" factors should not be
> > > > > > > > > > > determined by
> > > > > > > > > > > the
> > > > > > > > > > > application level timeout setting but by sensible
> > > > > > > > > > > calculations on
> > > > > > > > > > > transport delay variance / processing time,
> > > > > > > > > > > etc...  these
> > > > > > > > > > > calculation
> > > > > > > > > > > may differ between different use-cases /
> > > > > > > > > > > environments
> > > > > > > > > > > (for
> > > > > > > > > > > example
> > > > > > > > > > > in
> > > > > > > > > > > a low latency / real-time environment you may be
> > > > > > > > > > > able
> > > > > > > > > > > to make
> > > > > > > > > > > hard
> > > > > > > > > > > guarantees about the number of milliseconds that
> > > > > > > > > > > communication /
> > > > > > > > > > > processing delay will take... on the other hand if
> > > > > > > > > > > you
> > > > > > > > > > > are
> > > > > > > > > > > using an
> > > > > > > > > > > interpreted language with stop-the-world garbage
> > > > > > > > > > > collection
> > > > > > > > > > > you may
> > > > > > > > > > > not be able to say much better than the delay
> > > > > > > > > > > should be
> > > > > > > > > > > less
> > > > > > > > > > > than
> > > > > > > > > > > 30s
> > > > > > > > > > > or whatever).
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Yes - very important things to keep in mind when
> > > > > > > > > > implementing
> > > > > > > > > > this.  But the spec shouldn't be making these
> > > > > > > > > > suggestions
> > > > > > > > > > for
> > > > > > > > > > different implementation options. The spec should be
> > > > > > > > > > as
> > > > > > > > > > concise
> > > > > > > > > > as
> > > > > > > > > > possible about the mandated behavior, and leave the
> > > > > > > > > > implementation to
> > > > > > > > > > the developers.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I think application level APIs should be in terms
> > > > > > > > > > > of
> > > > > > > > > > > the
> > > > > > > > > > > timeouts
> > > > > > > > > > > that
> > > > > > > > > > > will affect the application.  The AMQP library
> > > > > > > > > > > should
> > > > > > > > > > > be
> > > > > > > > > > > massaging
> > > > > > > > > > > those numbers in such a way that they can fulfil
> > > > > > > > > > > the
> > > > > > > > > > > application
> > > > > > > > > > > requirements.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Agreed.  Now, is there _any_ way we can suggest an
> > > > > > > > > > update
> > > > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > spec?  Perhaps an errata, etc?
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > -- Rob
> > > > > > > > > > >
> > > > > > > > > > > On 28 September 2016 at 10:42, Robbie Gemmell
> > > > > > > > > > > <robbie.gemmell
> > > > > > > > > > > @gmail
> > > > > > > > > > > .com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On 27 September 2016 at 22:24, Alan Conway <aconw
> > > > > > > > > > > > ay@r
> > > > > > > > > > > > edhat.
> > > > > > > > > > > > com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, 2016-09-27 at 15:37 -0400, Alan Conway
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I want to clarify and document the meaning of
> > > > > > > > > > > > > > these
> > > > > > > > > > > > > > terms for
> > > > > > > > > > > > > > our
> > > > > > > > > > > > > > APIs,
> > > > > > > > > > > > > > presently I can't find anywhere where they
> > > > > > > > > > > > > > are
> > > > > > > > > > > > > > documented
> > > > > > > > > > > > > > clearly.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The AMQP spec says: "Each peer has its own
> > > > > > > > > > > > > > (independent) idle
> > > > > > > > > > > > > > timeout.
> > > > > > > > > > > > > > At connection open each peer communicates the
> > > > > > > > > > > > > > maximum
> > > > > > > > > > > > > > period between activity (frames) on the
> > > > > > > > > > > > > > connection that
> > > > > > > > > > > > > > it
> > > > > > > > > > > > > > desires
> > > > > > > > > > > > > > from
> > > > > > > > > > > > > > its partner.The open frame carries the
> > > > > > > > > > > > > > idletime-
> > > > > > > > > > > > > > out
> > > > > > > > > > > > > > field for this purpose. To avoid spurious
> > > > > > > > > > > > > > timeouts, the
> > > > > > > > > > > > > > value
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > idle-
> > > > > > > > > > > > > > time-out SHOULD be half the peer’s
> > > > > > > > > > > > > > actual timeout threshold."
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > In other words: if I send you an "open" frame
> > > > > > > > > > > > > > with
> > > > > > > > > > > > > > idle-time-
> > > > > > > > > > > > > > out=N
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > means *you* should not wait for longer than N
> > > > > > > > > > > > > > milliseconds to
> > > > > > > > > > > > > > send a
> > > > > > > > > > > > > > frame to me. It does not mean *I* will close
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > connection
> > > > > > > > > > > > > > after N
> > > > > > > > > > > > > > milliseconds, I SHOULD be more patient and
> > > > > > > > > > > > > > wait
> > > > > > > > > > > > > > for N*2
> > > > > > > > > > > > > > ms to
> > > > > > > > > > > > > > avoid
> > > > > > > > > > > > > > closing prematurely due to minor timing
> > > > > > > > > > > > > > wobbles.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I think the choice of name is slightly
> > > > > > > > > > > > > > ambiguous
> > > > > > > > > > > > > > but
> > > > > > > > > > > > > > the spec
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > clear
> > > > > > > > > > > > > > on the semantics, so it's important to
> > > > > > > > > > > > > > document
> > > > > > > > > > > > > > it to
> > > > > > > > > > > > > > remove
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > ambiguity.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Anybody disagree?
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Sigh. Sadly proton-C interprets "idle-timeout"
> > > > > > > > > > > > > differently
> > > > > > > > > > > > > depending on
> > > > > > > > > > > > > which end of the connection you are on:
> > > > > > > > > > > > >
> > > > > > > > > > > > >       // as per the recommendation in the spec,
> > > > > > > > > > > > > advertise
> > > > > > > > > > > > > half
> > > > > > > > > > > > > our
> > > > > > > > > > > > >       // actual timeout to the remote
> > > > > > > > > > > > >       const pn_millis_t idle_timeout =
> > > > > > > > > > > > > transport-
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > local_idle_timeout
> > > > > > > > > > > > >           ? (transport->local_idle_timeout/2)
> > > > > > > > > > > > >           : 0;
> > > > > > > > > > > > >
> > > > > > > > > > > > > So in proton, pn_set_idle_timeout does NOT mean
> > > > > > > > > > > > > set
> > > > > > > > > > > > > the
> > > > > > > > > > > > > AMQP
> > > > > > > > > > > > > idle-
> > > > > > > > > > > > > timeout value, it means set the local "receive
> > > > > > > > > > > > > timeout"
> > > > > > > > > > > > > value
> > > > > > > > > > > > > and send
> > > > > > > > > > > > > half that as the AMQP "send timeout" for the
> > > > > > > > > > > > > peer.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'm tempted to use a new term in the Go API:
> > > > > > > > > > > > > "heartbeat".
> > > > > > > > > > > > > To me
> > > > > > > > > > > > > that
> > > > > > > > > > > > > clearly means the "send timeout" (hearts beat,
> > > > > > > > > > > > > they
> > > > > > > > > > > > > don't
> > > > > > > > > > > > > listen for
> > > > > > > > > > > > > beats) so it coincides with the meaning of the
> > > > > > > > > > > > > AMQP
> > > > > > > > > > > > > "idle-
> > > > > > > > > > > > > timeout", but
> > > > > > > > > > > > > without the ambiguity that is exacerbated by
> > > > > > > > > > > > > proton
> > > > > > > > > > > > > interpreting it
> > > > > > > > > > > > > both ways.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Proton may seem to behave differently on each
> > > > > > > > > > > > end,
> > > > > > > > > > > > but I
> > > > > > > > > > > > don't
> > > > > > > > > > > > think
> > > > > > > > > > > > its necessarily a bad thing that it does, and it
> > > > > > > > > > > > is
> > > > > > > > > > > > also I
> > > > > > > > > > > > think
> > > > > > > > > > > > largely just reflecting an annoying bit in the
> > > > > > > > > > > > spec
> > > > > > > > > > > > around
> > > > > > > > > > > > this
> > > > > > > > > > > > where
> > > > > > > > > > > > different behaviours are allowed for, whereas it
> > > > > > > > > > > > would be
> > > > > > > > > > > > easier
> > > > > > > > > > > > if it
> > > > > > > > > > > > had less wiggle room.
> > > > > > > > > > > >
> > > > > > > > > > > > The transport setter/getter for the local timeout
> > > > > > > > > > > > takes the
> > > > > > > > > > > > 'actual
> > > > > > > > > > > > timeout' and then sends half of it as the
> > > > > > > > > > > > advertised
> > > > > > > > > > > > value
> > > > > > > > > > > > in the
> > > > > > > > > > > > Open
> > > > > > > > > > > > sent. This makes a certain amount of sense since
> > > > > > > > > > > > it
> > > > > > > > > > > > ensures
> > > > > > > > > > > > that
> > > > > > > > > > > > appropriate behaviour is actually satisfied,
> > > > > > > > > > > > rather
> > > > > > > > > > > > than
> > > > > > > > > > > > expecting the
> > > > > > > > > > > > user to ensure they only give half the value they
> > > > > > > > > > > > really
> > > > > > > > > > > > want for
> > > > > > > > > > > > their actual timeout. The getter for the remote
> > > > > > > > > > > > timeout
> > > > > > > > > > > > value on
> > > > > > > > > > > > the
> > > > > > > > > > > > other hand returns the advertised value from the
> > > > > > > > > > > > Open
> > > > > > > > > > > > that
> > > > > > > > > > > > is
> > > > > > > > > > > > received. I expect it does that since it cant
> > > > > > > > > > > > actually ever
> > > > > > > > > > > > return the
> > > > > > > > > > > > remotes 'actual timeout' without making an
> > > > > > > > > > > > assumption, i.e
> > > > > > > > > > > > that
> > > > > > > > > > > > they
> > > > > > > > > > > > did in fact advertise half (or less) of their
> > > > > > > > > > > > actual
> > > > > > > > > > > > timeout,
> > > > > > > > > > > > which
> > > > > > > > > > > > the spec only says that they SHOULD do.
> > > > > > > > > > > >
> > > > > > > > > > > > Yes the local setter taking the advertised value
> > > > > > > > > > > > may
> > > > > > > > > > > > have
> > > > > > > > > > > > been
> > > > > > > > > > > > better
> > > > > > > > > > > > for method consistency with the remote getter. On
> > > > > > > > > > > > the
> > > > > > > > > > > > other
> > > > > > > > > > > > hand,
> > > > > > > > > > > > sending of necessary heartbeats is handled
> > > > > > > > > > > > directly
> > > > > > > > > > > > by the
> > > > > > > > > > > > transport
> > > > > > > > > > > > during the tick process, so users may not
> > > > > > > > > > > > necessarily
> > > > > > > > > > > > even
> > > > > > > > > > > > use
> > > > > > > > > > > > the
> > > > > > > > > > > > getter themselves, and proton uses that remote
> > > > > > > > > > > > value
> > > > > > > > > > > > internally
> > > > > > > > > > > > by
> > > > > > > > > > > > pessimistically halfing it to account for the
> > > > > > > > > > > > case
> > > > > > > > > > > > that
> > > > > > > > > > > > folks on
> > > > > > > > > > > > the
> > > > > > > > > > > > other end did not advertise half their actual
> > > > > > > > > > > > timeout
> > > > > > > > > > > > (since the
> > > > > > > > > > > > spec
> > > > > > > > > > > > doesnt require that they do). Side note: proton
> > > > > > > > > > > > could
> > > > > > > > > > > > arguably be
> > > > > > > > > > > > less
> > > > > > > > > > > > pessimistic here and go for say a percentage much
> > > > > > > > > > > > nearer
> > > > > > > > > > > > the full
> > > > > > > > > > > > advertised value, but then you'd probably need to
> > > > > > > > > > > > start
> > > > > > > > > > > > guaging
> > > > > > > > > > > > how
> > > > > > > > > > > > close is too close.
> > > > > > > > > > > >
> > > > > > > > > > > > I think ensuring the doccumentation on the
> > > > > > > > > > > > methods is
> > > > > > > > > > > > clear
> > > > > > > > > > > > what
> > > > > > > > > > > > they
> > > > > > > > > > > > do is sufficient enough here. I actually prefer
> > > > > > > > > > > > idle-
> > > > > > > > > > > > timeout as
> > > > > > > > > > > > an
> > > > > > > > > > > > name rather than heartbeat due to the way this
> > > > > > > > > > > > all
> > > > > > > > > > > > works.
> > > > > > > > > > > > Since
> > > > > > > > > > > > you
> > > > > > > > > > > > only tell the other side [half] your timeout, you
> > > > > > > > > > > > dont
> > > > > > > > > > > > actually
> > > > > > > > > > > > have
> > > > > > > > > > > > direct control over when they send any needed
> > > > > > > > > > > > empty
> > > > > > > > > > > > frames
> > > > > > > > > > > > to
> > > > > > > > > > > > satisfy
> > > > > > > > > > > > it (as the above shows, we might send them more
> > > > > > > > > > > > often
> > > > > > > > > > > > than
> > > > > > > > > > > > they
> > > > > > > > > > > > require) and 'heartbeat' might seem to imply that
> > > > > > > > > > > > you
> > > > > > > > > > > > do,
> > > > > > > > > > > > and
> > > > > > > > > > > > possibly
> > > > > > > > > > > > even that they need be sent at that period all
> > > > > > > > > > > > the
> > > > > > > > > > > > time
> > > > > > > > > > > > even
> > > > > > > > > > > > despite
> > > > > > > > > > > > regular traffic, which is not the case.
> > > > > > > > > > > >
> > > > > > > > > > > > Robbie
> > > > > > > > > > > >
> > > > > > > > > > > > -----------------------------------------------
> > > > > > > > > > > > ----
> > > > > > > > > > > > ------
> > > > > > > > > > > > ------
> > > > > > > > > > > > ------
> > > > > > > > > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apac
> > > > > > > > > > > > he.o
> > > > > > > > > > > > rg
> > > > > > > > > > > > For additional commands, e-mail: dev-help@qpid.ap
> > > > > > > > > > > > ache
> > > > > > > > > > > > .org
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > -------------------------------------------------
> > > > > > > > > > > ----
> > > > > > > > > > > ------
> > > > > > > > > > > ------
> > > > > > > > > > > ----
> > > > > > > > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache
> > > > > > > > > > > .org
> > > > > > > > > > > For additional commands, e-mail: dev-help@qpid.apac
> > > > > > > > > > > he.o
> > > > > > > > > > > rg
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > -----------------------------------------------------
> > > > > > > > > ----
> > > > > > > > > ------
> > > > > > > > > ------
> > > > > > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> > > > > > > > > For additional commands, e-mail: dev-help@qpid.apache.o
> > > > > > > > > rg
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > > ---------------------------------------------------------
> > > > > > > ----
> > > > > > > --------
> > > > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> > > > > > > For additional commands, e-mail: dev-help@qpid.apache.org
> > > > > > >
> > > > > >
> > > > > >
> > > > > > -----------------------------------------------------------
> > > > > > ----
> > > > > > ------
> > > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> > > > > > For additional commands, e-mail: dev-help@qpid.apache.org
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > -K
> > > > >
> > > > > -------------------------------------------------------------
> > > > > ----
> > > > > ----
> > > > > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> > > > > For additional commands, e-mail: users-help@qpid.apache.org
> > > > >
> > > >
> > > > ---------------------------------------------------------------
> > > > ------
> > > > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> > > > For additional commands, e-mail: users-help@qpid.apache.org
> > > >
> > >
> > >
> > > -----------------------------------------------------------------
> > > ----
> > > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> > > For additional commands, e-mail: users-help@qpid.apache.org
> > >
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>
>

Re: API and terms: idle-time-out and heartbeat intervals.

Posted by Rob Godfrey <ro...@gmail.com>.

OK - that (with the above documentation) seems fine to me - I like the use
of Heartbeat to properly from the notion of the timeout threshold.

-- Rob

On 17 October 2016 at 21:09, Alan Conway <ac...@redhat.com> wrote:

> On Mon, 2016-10-17 at 17:08 +0100, Rob Godfrey wrote:
> > If this API is not also making any decision on the (local) "timeout
> > threshold" then that seems eminently reasonable (and what anyone with
> > a
> > passing familiarity of the spec would expect).
> >
> > If the API is using the same call to also determine when the library
> > will
> > actually consider the peer has ceased to function, then I would be
> > more
> > concerned (albeit in an abstract way as I'm not actually currently
> > using
> > the API at all :-) ).
> >
>
> Here's what I am considering for Go:
>
> Called by the end initiating the connection:
>
> // Heartbeat returns a ConnectionOption that requests the maximum delay
> // between sending frames for the remote peer. If we don't receive any
> frames
> // within 2*delay we will close the connection.
> //
> func Heartbeat(delay time.Duration) ConnectionOption {
>         // Proton-C divides the idle-timeout by 2 before sending, so
> compensate.
>         return func(c *connection) { c.engine.Transport().SetIdleTimeout(2
> * delay) }
> }
>
> Called to query an incoming connection:
>
>         // Heartbeat is the maximum delay between sending frames that the
> remote peer
>         // has requested of us. If the interval expires an empty
> "heartbeat" frame
>         // will be sent automatically to keep the connection open.
>         Heartbeat() time.Duration
>
> Key point is there is just *one* number that is passed to the API, sent
> on the wire, and available to the other end, and it has a clearly
> described meaning. The built-in automatic behavior is also described.
>
> IMO using "idle timeout" to mean one thing at one end of the
> connection, and a different thing at the other (and wait, which one
> goes on the wire again?) is insane.
>
> > -- Rob
> >
> > On 17 October 2016 at 16:57, Alan Conway <ac...@redhat.com> wrote:
> >
> > >
> > > On Fri, 2016-09-30 at 14:40 +0100, Rob Godfrey wrote:
> > > >
> > > > I don't think it's a bug - it's a completely valid (if chatty)
> > > > choice.
> > > >
> > > > To my mind the semantics of the field are this:
> > > >
> > > > The sender of the open frame is saying "if I do not receive any
> > > > data
> > > > for this period of time, I reserve the right to assume that you
> > > > are
> > > > no
> > > > longer functioning correctly"
> > > > On receiving this information, the peer should decide on a
> > > > strategy
> > > > that ensures to the best of its ability, that if it is
> > > > functioning
> > > > normally it will be generating data sufficiently frequently that
> > > > the
> > > > condition does not occur.  If this peer knows that it is
> > > > susceptible
> > > > to delays outside its direct control (such as garbage collection
> > > > pauses) it should take account of this in how often it schedules
> > > > sending heartbeat frames.
> > > > At the same time, the sender of this value should give some
> > > > leeway
> > > > before actually shutting down to account for unexpected transport
> > > > delays, or processing delays on its side.  Since it is unaware of
> > > > the
> > > > nature of the allowances made at the peer, it may decide to be
> > > > particularly generous in the leeway it grants.
> > > >
> > > > I think the spec mentioning a ratio (such as twice) is massively
> > > > unhelpful.
> > > >
> > > > In terms of the API - what is the applications intent in setting
> > > > the
> > > > value?  Is the intent to ensure transport activity within a
> > > > specific
> > > > period, or to set a hard limit on how long the library will wait
> > > > before generating some sort of timeout exception?
> > >
> > > My guess is that in terms of the API, the application's intent in
> > > calling "set_idle_timeout(T)" is to set the formally-specified AMQP
> > > value called "idle-timeout" to T. I don't think anybody's first
> > > reading
> > > of the API doc or the spec would suggest to them that if they want
> > > to
> > > set a wire timeout of T they actually need to call
> > > set_idle_timeout(T*2) on one side of the connection and it will
> > > magically come out as T on the other.
> > >
> > > Spec wording aside, does anybody actually disagree with that?
> > >
> > >
> > > >
> > > > -- Rob
> > > >
> > > >
> > > > On 30 September 2016 at 14:25, Ken Giusti <kg...@redhat.com>
> > > > wrote:
> > > > >
> > > > >
> > > > > CC'ing user's list
> > > > >
> > > > > Given the interpretations of the spec discussed below is the
> > > > > way
> > > > > Proton handles this functionality inconsistent with itself?
> > > > >
> > > > > 1) Proton sends an open frame with _half_ the value of the
> > > > > timeout
> > > > > as set by the application.
> > > > > 2) Proton interprets the idle-timeout in the received open as
> > > > > the
> > > > > actual time out interval, and pessimistically ;) generates
> > > > > heartbeats as 1/2 that value.
> > > > >
> > > > > Is this a bug in the Proton implementation?
> > > > >
> > > > > -K
> > > > >
> > > > > ----- Original Message -----
> > > > > >
> > > > > >
> > > > > > From: "Alan Conway" <ac...@redhat.com>
> > > > > > To: dev@qpid.apache.org
> > > > > > Sent: Thursday, September 29, 2016 9:11:51 AM
> > > > > > Subject: Re: API and terms: idle-time-out and heartbeat
> > > > > > intervals.
> > > > > >
> > > > > > On Wed, 2016-09-28 at 22:33 +0100, Robbie Gemmell wrote:
> > > > > > >
> > > > > > >
> > > > > > > The spec unfortunately says what it says, and even if there
> > > > > > > were
> > > > > > > scope
> > > > > > > to update it, I'm not sure there is a way to change it to
> > > > > > > address the
> > > > > > > main issue (which regardless of any other clarity issues,
> > > > > > > is I
> > > > > > > think
> > > > > > > just that it only says you SHOULD advertise half your
> > > > > > > 'actual
> > > > > > > timeout') that wouldn't result in an 'incompatible change'
> > > > > > > of
> > > > > > > behaviour.
> > > > > >
> > > > > > I look at this differently. We agree on the semantics of
> > > > > > idle-
> > > > > > time-out
> > > > > > defined by the spec. Now forget the spec wording, it is easy
> > > > > > to
> > > > > > explain
> > > > > > clearly what the semantics are. It is not "half your actual
> > > > > > time
> > > > > > out",
> > > > > > it is the *exact* frame rate you expect from your peer. The
> > > > > > peer's
> > > > > > *only* task is to respect that frame rate, they do not need
> > > > > > any
> > > > > > more
> > > > > > information than that - they certainly don't need to guess at
> > > > > > what your
> > > > > > margin for error is for keeping the connection open.
> > > > > >
> > > > > > The margin for error is an implementation detail that does
> > > > > > *not*
> > > > > > need
> > > > > > to be advertised. Double the frame rate is a reasonable
> > > > > > general
> > > > > > recommendation, but the ideal margin depends on latency
> > > > > > variability in
> > > > > > the system, you can imagine systems where it might be better
> > > > > > to
> > > > > > have a
> > > > > > larger or a smaller margin.
> > > > > >
> > > > > > The name 'idle-time-out' is not an ideal choice, 'heartbeat'
> > > > > > or
> > > > > > 'max-
> > > > > > frame-delay' might be better. But once we know what it means,
> > > > > > the
> > > > > > semantics are perfectly clear.
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Knowing the exact enforced timeout could certainly be
> > > > > > > clearer
> > > > > > > in some
> > > > > > > ways, but if the period you had to send after wasnt defined
> > > > > > > (e.g
> > > > > > > half)
> > > > > > > then it would essentially leave you with the same problem
> > > > > > > as
> > > > > > > now:
> > > > > > > deciding exactly how early you should send heartbeat frames
> > > > > > > to
> > > > > > > avoid
> > > > > > > a
> > > > > > > [more-precisely-known] timeout.
> > > > > >
> > > > > > The only important thing is to agree on the meaning of the
> > > > > > advertised
> > > > > > value: is it the frame rate or the connection close
> > > > > > threshold?
> > > > > > The work
> > > > > >  of adjusting at one end or the other is equivalent.
> > > > > >
> > > > > > I think we agree it is currently defined as the frame rate, I
> > > > > > see
> > > > > > no
> > > > > > benefit in trying to change the meaning. (There would be no
> > > > > > drawback
> > > > > > either if the spec was not already published and implemented
> > > > > > by
> > > > > > multiple implementors, but it is)
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > With what the spec says we effectively have a lower-bound
> > > > > > > on
> > > > > > > the
> > > > > > > actual-timeout (if cases the peer didnt half their actual-
> > > > > > > timeout
> > > > > > > before advertising a number) and an upper bound (if they
> > > > > > > did)
> > > > > > > due to
> > > > > > > the use of SHOULD as opposed to MUST or some other more
> > > > > > > fixed
> > > > > > > definition of behaviour. So we currently try to send frames
> > > > > > > often
> > > > > > > enough to satisfy that lower bound, i.e the number we
> > > > > > > received.
> > > > > > > That
> > > > > > > essentially reduces it to the the same problem as if we
> > > > > > > were
> > > > > > > trying
> > > > > > > to
> > > > > > > satisfy a theoretical exactly-known actual-timeout (just
> > > > > > > using
> > > > > > > a
> > > > > > > smaller number that we dont want to use, because we
> > > > > > > followed
> > > > > > > the
> > > > > > > specs
> > > > > > > recommendation). Currently we do this by sending frames
> > > > > > > after
> > > > > > > half
> > > > > > > the
> > > > > > > advertised period, i.e. what we know is actually a quarter
> > > > > > > of
> > > > > > > the
> > > > > > > actual-timeout enforced by peers such as proton which are
> > > > > > > following
> > > > > > > the specs recommendations. We could choose to do it less
> > > > > > > often
> > > > > > > than
> > > > > > > that by reducing how conservative it is, e.g use 75+% of
> > > > > > > advertised
> > > > > > > value instead for example, which should have no impact in
> > > > > > > the
> > > > > > > cases
> > > > > > > where peers had followed the recommendation (as we would
> > > > > > > still
> > > > > > > be
> > > > > > > sending more than twice as quickly as needed to satisfy
> > > > > > > their
> > > > > > > actual-timeout), but would put us closer to spurious
> > > > > > > timeout (
> > > > > > > if e.g
> > > > > > > there was a delay in delivering/processing the heartbeat)
> > > > > > > against any
> > > > > > > peers that didn't follow the recommendation.
> > > > > > >
> > > > > > > On 28 September 2016 at 21:26, Justin Ross <justin.ross@gma
> > > > > > > il.c
> > > > > > > om>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > IMO, the overall picture is simpler, and easier to
> > > > > > > > explain to
> > > > > > > > third
> > > > > > > > parties, if we go the way Ken suggested.  When a remote
> > > > > > > > peer
> > > > > > > > sends
> > > > > > > > you an
> > > > > > > > idle timeout value, it is an expression of an (actual,
> > > > > > > > not
> > > > > > > > simply
> > > > > > > > "advertised") guarantee - "I will expire the connection
> > > > > > > > after
> > > > > > > > X
> > > > > > > > time
> > > > > > > > without receiving a frame from you".
> > > > > > > >
> > > > > > > > We could also legitimately go the direction you
> > > > > > > > suggest.  *But* its
> > > > > > > > name is
> > > > > > > > "idle timeout".  We can't easily change the name.  I
> > > > > > > > think we
> > > > > > > > should take
> > > > > > > > the spec text that goes with the name, and the behavior
> > > > > > > > of
> > > > > > > > our
> > > > > > > > components,
> > > > > > > > firmly in the direction Ken suggests.
> > > > > > > >
> > > > > > > > Off topic: why is this on the dev list?
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Sep 28, 2016 at 12:36 PM, Alan Conway <aconway@re
> > > > > > > > dhat
> > > > > > > > .com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, 2016-09-28 at 10:13 -0400, Ken Giusti wrote:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I've had a hand in the way Proton/C interprets the
> > > > > > > > > > meaning of
> > > > > > > > > > 'idle-
> > > > > > > > > > timeout' and I've never liked the solution.  I think
> > > > > > > > > > Proton/C's
> > > > > > > > > > behavior is not 'pessimistic' as much as it is
> > > > > > > > > > 'conservative'
> > > > > > > > > > for the
> > > > > > > > > > sake of interoperability.  This, unfortunately ends
> > > > > > > > > > up
> > > > > > > > > > with a
> > > > > > > > > > needless idle frame chattiness when both ends are
> > > > > > > > > > Proton-
> > > > > > > > > > based.
> > > > > > > > > >
> > > > > > > > > > ----- Original Message -----
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > From: "Rob Godfrey" <ro...@gmail.com>
> > > > > > > > > > > To: "qpid" <de...@qpid.apache.org>
> > > > > > > > > > > Sent: Wednesday, September 28, 2016 6:19:05 AM
> > > > > > > > > > > Subject: Re: API and terms: idle-time-out and
> > > > > > > > > > > heartbeat
> > > > > > > > > > > intervals.
> > > > > > > > > > >
> > > > > > > > > > > I agree that specifying that the communicated
> > > > > > > > > > > figure
> > > > > > > > > > > should
> > > > > > > > > > > be
> > > > > > > > > > > "half"
> > > > > > > > > > > the "actual" timeout was a mistake.
> > > > > > > > > > >
> > > > > > > > > > > What the spec should have tried to communicate is
> > > > > > > > > > > that
> > > > > > > > > > > the
> > > > > > > > > > > sender
> > > > > > > > > > > should communicate a value somewhat less than the
> > > > > > > > > > > period it
> > > > > > > > > > > uses to
> > > > > > > > > > > determine that the connection has actually timed-
> > > > > > > > > > > out to
> > > > > > > > > > > allow
> > > > > > > > > > > for
> > > > > > > > > > > the
> > > > > > > > > > > receiver to process and emit a heartbeat frame.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Wouldn't it be much clearer to simply send the
> > > > > > > > > > _actual_
> > > > > > > > > > idle
> > > > > > > > > > timeout
> > > > > > > > > > value?
> > > > > > > > >
> > > > > > > > > My read is that is exactly what it does: It sends the
> > > > > > > > > max
> > > > > > > > > time
> > > > > > > > > that the
> > > > > > > > > *sender* of frames may be idle. The receiver of frames
> > > > > > > > > SHOULD be
> > > > > > > > > more
> > > > > > > > > patient than that. The wording of the "discussion"
> > > > > > > > > around
> > > > > > > > > it and
> > > > > > > > > the
> > > > > > > > > choice of terms is a bit cloudy but, the text that
> > > > > > > > > describes
> > > > > > > > > idle-time-
> > > > > > > > > out seems clear enough: it is the max interval between
> > > > > > > > > sending
> > > > > > > > > frames.
> > > > > > > > > The frame receiver SHOULD wait longer that that before
> > > > > > > > > closing,
> > > > > > > > > and 2x
> > > > > > > > > seems a reasonable suggestion, but that's for the impl
> > > > > > > > > to
> > > > > > > > > decide.
> > > > > > > > >
> > > > > > > > > It's weird that it says "idle-time-out should be half
> > > > > > > > > the
> > > > > > > > > threshold"
> > > > > > > > > instead of "the threshold should be twice the idle-
> > > > > > > > > time-
> > > > > > > > > out" but
> > > > > > > > > it's
> > > > > > > > > logically equivalent.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Having the spec suggest "communicating a value
> > > > > > > > > > *somewhat
> > > > > > > > > > less*"
> > > > > > > > >
> > > > > > > > > The wording is odd but the semantics are you
> > > > > > > > > communicate
> > > > > > > > > *exactly* the
> > > > > > > > > max frame delay you want and then you SHOULD set your
> > > > > > > > > connection
> > > > > > > > > close
> > > > > > > > > threshold to something bigger. The other end doesn't
> > > > > > > > > need
> > > > > > > > > to know
> > > > > > > > > how
> > > > > > > > > much bigger, they just need to know what rate to send
> > > > > > > > > frames.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > [emphasis mine] leaves the implementation open for
> > > > > > > > > > interpretation -
> > > > > > > > > > which is exactly how we got into this mess in the
> > > > > > > > > > first
> > > > > > > > > > place.  Developers are a smart bunch - they know that
> > > > > > > > > > keep
> > > > > > > > > > alive
> > > > > > > > > > traffic will have to be sent frequently enough to
> > > > > > > > > > prevent
> > > > > > > > > > idle
> > > > > > > > > > timeout.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >  Similarly the sender
> > > > > > > > > > > should ensure that a frame has been emitted well
> > > > > > > > > > > within
> > > > > > > > > > > the
> > > > > > > > > > > timeout
> > > > > > > > > > > period to allow for any communication / processing
> > > > > > > > > > > delay.
> > > > > > > > > >
> > > > > > > > > > Agreed - perfectly acceptable for the spec to point
> > > > > > > > > > this
> > > > > > > > > > out.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >  In practice
> > > > > > > > > > > these "wiggle room" factors should not be
> > > > > > > > > > > determined by
> > > > > > > > > > > the
> > > > > > > > > > > application level timeout setting but by sensible
> > > > > > > > > > > calculations on
> > > > > > > > > > > transport delay variance / processing time,
> > > > > > > > > > > etc...  these
> > > > > > > > > > > calculation
> > > > > > > > > > > may differ between different use-cases /
> > > > > > > > > > > environments
> > > > > > > > > > > (for
> > > > > > > > > > > example
> > > > > > > > > > > in
> > > > > > > > > > > a low latency / real-time environment you may be
> > > > > > > > > > > able
> > > > > > > > > > > to make
> > > > > > > > > > > hard
> > > > > > > > > > > guarantees about the number of milliseconds that
> > > > > > > > > > > communication /
> > > > > > > > > > > processing delay will take... on the other hand if
> > > > > > > > > > > you
> > > > > > > > > > > are
> > > > > > > > > > > using an
> > > > > > > > > > > interpreted language with stop-the-world garbage
> > > > > > > > > > > collection
> > > > > > > > > > > you may
> > > > > > > > > > > not be able to say much better than the delay
> > > > > > > > > > > should be
> > > > > > > > > > > less
> > > > > > > > > > > than
> > > > > > > > > > > 30s
> > > > > > > > > > > or whatever).
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Yes - very important things to keep in mind when
> > > > > > > > > > implementing
> > > > > > > > > > this.  But the spec shouldn't be making these
> > > > > > > > > > suggestions
> > > > > > > > > > for
> > > > > > > > > > different implementation options. The spec should be
> > > > > > > > > > as
> > > > > > > > > > concise
> > > > > > > > > > as
> > > > > > > > > > possible about the mandated behavior, and leave the
> > > > > > > > > > implementation to
> > > > > > > > > > the developers.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I think application level APIs should be in terms
> > > > > > > > > > > of
> > > > > > > > > > > the
> > > > > > > > > > > timeouts
> > > > > > > > > > > that
> > > > > > > > > > > will affect the application.  The AMQP library
> > > > > > > > > > > should
> > > > > > > > > > > be
> > > > > > > > > > > massaging
> > > > > > > > > > > those numbers in such a way that they can fulfil
> > > > > > > > > > > the
> > > > > > > > > > > application
> > > > > > > > > > > requirements.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Agreed.  Now, is there _any_ way we can suggest an
> > > > > > > > > > update
> > > > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > spec?  Perhaps an errata, etc?
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > -- Rob
> > > > > > > > > > >
> > > > > > > > > > > On 28 September 2016 at 10:42, Robbie Gemmell
> > > > > > > > > > > <robbie.gemmell
> > > > > > > > > > > @gmail
> > > > > > > > > > > .com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On 27 September 2016 at 22:24, Alan Conway <aconw
> > > > > > > > > > > > ay@r
> > > > > > > > > > > > edhat.
> > > > > > > > > > > > com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, 2016-09-27 at 15:37 -0400, Alan Conway
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I want to clarify and document the meaning of
> > > > > > > > > > > > > > these
> > > > > > > > > > > > > > terms for
> > > > > > > > > > > > > > our
> > > > > > > > > > > > > > APIs,
> > > > > > > > > > > > > > presently I can't find anywhere where they
> > > > > > > > > > > > > > are
> > > > > > > > > > > > > > documented
> > > > > > > > > > > > > > clearly.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The AMQP spec says: "Each peer has its own
> > > > > > > > > > > > > > (independent) idle
> > > > > > > > > > > > > > timeout.
> > > > > > > > > > > > > > At connection open each peer communicates the
> > > > > > > > > > > > > > maximum
> > > > > > > > > > > > > > period between activity (frames) on the
> > > > > > > > > > > > > > connection that
> > > > > > > > > > > > > > it
> > > > > > > > > > > > > > desires
> > > > > > > > > > > > > > from
> > > > > > > > > > > > > > its partner.The open frame carries the
> > > > > > > > > > > > > > idletime-
> > > > > > > > > > > > > > out
> > > > > > > > > > > > > > field for this purpose. To avoid spurious
> > > > > > > > > > > > > > timeouts, the
> > > > > > > > > > > > > > value
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > idle-
> > > > > > > > > > > > > > time-out SHOULD be half the peer’s
> > > > > > > > > > > > > > actual timeout threshold."
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > In other words: if I send you an "open" frame
> > > > > > > > > > > > > > with
> > > > > > > > > > > > > > idle-time-
> > > > > > > > > > > > > > out=N
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > means *you* should not wait for longer than N
> > > > > > > > > > > > > > milliseconds to
> > > > > > > > > > > > > > send a
> > > > > > > > > > > > > > frame to me. It does not mean *I* will close
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > connection
> > > > > > > > > > > > > > after N
> > > > > > > > > > > > > > milliseconds, I SHOULD be more patient and
> > > > > > > > > > > > > > wait
> > > > > > > > > > > > > > for N*2
> > > > > > > > > > > > > > ms to
> > > > > > > > > > > > > > avoid
> > > > > > > > > > > > > > closing prematurely due to minor timing
> > > > > > > > > > > > > > wobbles.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I think the choice of name is slightly
> > > > > > > > > > > > > > ambiguous
> > > > > > > > > > > > > > but
> > > > > > > > > > > > > > the spec
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > clear
> > > > > > > > > > > > > > on the semantics, so it's important to
> > > > > > > > > > > > > > document
> > > > > > > > > > > > > > it to
> > > > > > > > > > > > > > remove
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > ambiguity.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Anybody disagree?
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Sigh. Sadly proton-C interprets "idle-timeout"
> > > > > > > > > > > > > differently
> > > > > > > > > > > > > depending on
> > > > > > > > > > > > > which end of the connection you are on:
> > > > > > > > > > > > >
> > > > > > > > > > > > >       // as per the recommendation in the spec,
> > > > > > > > > > > > > advertise
> > > > > > > > > > > > > half
> > > > > > > > > > > > > our
> > > > > > > > > > > > >       // actual timeout to the remote
> > > > > > > > > > > > >       const pn_millis_t idle_timeout =
> > > > > > > > > > > > > transport-
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > local_idle_timeout
> > > > > > > > > > > > >           ? (transport->local_idle_timeout/2)
> > > > > > > > > > > > >           : 0;
> > > > > > > > > > > > >
> > > > > > > > > > > > > So in proton, pn_set_idle_timeout does NOT mean
> > > > > > > > > > > > > set
> > > > > > > > > > > > > the
> > > > > > > > > > > > > AMQP
> > > > > > > > > > > > > idle-
> > > > > > > > > > > > > timeout value, it means set the local "receive
> > > > > > > > > > > > > timeout"
> > > > > > > > > > > > > value
> > > > > > > > > > > > > and send
> > > > > > > > > > > > > half that as the AMQP "send timeout" for the
> > > > > > > > > > > > > peer.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'm tempted to use a new term in the Go API:
> > > > > > > > > > > > > "heartbeat".
> > > > > > > > > > > > > To me
> > > > > > > > > > > > > that
> > > > > > > > > > > > > clearly means the "send timeout" (hearts beat,
> > > > > > > > > > > > > they
> > > > > > > > > > > > > don't
> > > > > > > > > > > > > listen for
> > > > > > > > > > > > > beats) so it coincides with the meaning of the
> > > > > > > > > > > > > AMQP
> > > > > > > > > > > > > "idle-
> > > > > > > > > > > > > timeout", but
> > > > > > > > > > > > > without the ambiguity that is exacerbated by
> > > > > > > > > > > > > proton
> > > > > > > > > > > > > interpreting it
> > > > > > > > > > > > > both ways.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Proton may seem to behave differently on each
> > > > > > > > > > > > end,
> > > > > > > > > > > > but I
> > > > > > > > > > > > don't
> > > > > > > > > > > > think
> > > > > > > > > > > > its necessarily a bad thing that it does, and it
> > > > > > > > > > > > is
> > > > > > > > > > > > also I
> > > > > > > > > > > > think
> > > > > > > > > > > > largely just reflecting an annoying bit in the
> > > > > > > > > > > > spec
> > > > > > > > > > > > around
> > > > > > > > > > > > this
> > > > > > > > > > > > where
> > > > > > > > > > > > different behaviours are allowed for, whereas it
> > > > > > > > > > > > would be
> > > > > > > > > > > > easier
> > > > > > > > > > > > if it
> > > > > > > > > > > > had less wiggle room.
> > > > > > > > > > > >
> > > > > > > > > > > > The transport setter/getter for the local timeout
> > > > > > > > > > > > takes the
> > > > > > > > > > > > 'actual
> > > > > > > > > > > > timeout' and then sends half of it as the
> > > > > > > > > > > > advertised
> > > > > > > > > > > > value
> > > > > > > > > > > > in the
> > > > > > > > > > > > Open
> > > > > > > > > > > > sent. This makes a certain amount of sense since
> > > > > > > > > > > > it
> > > > > > > > > > > > ensures
> > > > > > > > > > > > that
> > > > > > > > > > > > appropriate behaviour is actually satisfied,
> > > > > > > > > > > > rather
> > > > > > > > > > > > than
> > > > > > > > > > > > expecting the
> > > > > > > > > > > > user to ensure they only give half the value they
> > > > > > > > > > > > really
> > > > > > > > > > > > want for
> > > > > > > > > > > > their actual timeout. The getter for the remote
> > > > > > > > > > > > timeout
> > > > > > > > > > > > value on
> > > > > > > > > > > > the
> > > > > > > > > > > > other hand returns the advertised value from the
> > > > > > > > > > > > Open
> > > > > > > > > > > > that
> > > > > > > > > > > > is
> > > > > > > > > > > > received. I expect it does that since it cant
> > > > > > > > > > > > actually ever
> > > > > > > > > > > > return the
> > > > > > > > > > > > remotes 'actual timeout' without making an
> > > > > > > > > > > > assumption, i.e
> > > > > > > > > > > > that
> > > > > > > > > > > > they
> > > > > > > > > > > > did in fact advertise half (or less) of their
> > > > > > > > > > > > actual
> > > > > > > > > > > > timeout,
> > > > > > > > > > > > which
> > > > > > > > > > > > the spec only says that they SHOULD do.
> > > > > > > > > > > >
> > > > > > > > > > > > Yes the local setter taking the advertised value
> > > > > > > > > > > > may
> > > > > > > > > > > > have
> > > > > > > > > > > > been
> > > > > > > > > > > > better
> > > > > > > > > > > > for method consistency with the remote getter. On
> > > > > > > > > > > > the
> > > > > > > > > > > > other
> > > > > > > > > > > > hand,
> > > > > > > > > > > > sending of necessary heartbeats is handled
> > > > > > > > > > > > directly
> > > > > > > > > > > > by the
> > > > > > > > > > > > transport
> > > > > > > > > > > > during the tick process, so users may not
> > > > > > > > > > > > necessarily
> > > > > > > > > > > > even
> > > > > > > > > > > > use
> > > > > > > > > > > > the
> > > > > > > > > > > > getter themselves, and proton uses that remote
> > > > > > > > > > > > value
> > > > > > > > > > > > internally
> > > > > > > > > > > > by
> > > > > > > > > > > > pessimistically halfing it to account for the
> > > > > > > > > > > > case
> > > > > > > > > > > > that
> > > > > > > > > > > > folks on
> > > > > > > > > > > > the
> > > > > > > > > > > > other end did not advertise half their actual
> > > > > > > > > > > > timeout
> > > > > > > > > > > > (since the
> > > > > > > > > > > > spec
> > > > > > > > > > > > doesnt require that they do). Side note: proton
> > > > > > > > > > > > could
> > > > > > > > > > > > arguably be
> > > > > > > > > > > > less
> > > > > > > > > > > > pessimistic here and go for say a percentage much
> > > > > > > > > > > > nearer
> > > > > > > > > > > > the full
> > > > > > > > > > > > advertised value, but then you'd probably need to
> > > > > > > > > > > > start
> > > > > > > > > > > > guaging
> > > > > > > > > > > > how
> > > > > > > > > > > > close is too close.
> > > > > > > > > > > >
> > > > > > > > > > > > I think ensuring the doccumentation on the
> > > > > > > > > > > > methods is
> > > > > > > > > > > > clear
> > > > > > > > > > > > what
> > > > > > > > > > > > they
> > > > > > > > > > > > do is sufficient enough here. I actually prefer
> > > > > > > > > > > > idle-
> > > > > > > > > > > > timeout as
> > > > > > > > > > > > an
> > > > > > > > > > > > name rather than heartbeat due to the way this
> > > > > > > > > > > > all
> > > > > > > > > > > > works.
> > > > > > > > > > > > Since
> > > > > > > > > > > > you
> > > > > > > > > > > > only tell the other side [half] your timeout, you
> > > > > > > > > > > > dont
> > > > > > > > > > > > actually
> > > > > > > > > > > > have
> > > > > > > > > > > > direct control over when they send any needed
> > > > > > > > > > > > empty
> > > > > > > > > > > > frames
> > > > > > > > > > > > to
> > > > > > > > > > > > satisfy
> > > > > > > > > > > > it (as the above shows, we might send them more
> > > > > > > > > > > > often
> > > > > > > > > > > > than
> > > > > > > > > > > > they
> > > > > > > > > > > > require) and 'heartbeat' might seem to imply that
> > > > > > > > > > > > you
> > > > > > > > > > > > do,
> > > > > > > > > > > > and
> > > > > > > > > > > > possibly
> > > > > > > > > > > > even that they need be sent at that period all
> > > > > > > > > > > > the
> > > > > > > > > > > > time
> > > > > > > > > > > > even
> > > > > > > > > > > > despite
> > > > > > > > > > > > regular traffic, which is not the case.
> > > > > > > > > > > >
> > > > > > > > > > > > Robbie
> > > > > > > > > > > >
> > > > > > > > > > > > -----------------------------------------------
> > > > > > > > > > > > ----
> > > > > > > > > > > > ------
> > > > > > > > > > > > ------
> > > > > > > > > > > > ------
> > > > > > > > > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apac
> > > > > > > > > > > > he.o
> > > > > > > > > > > > rg
> > > > > > > > > > > > For additional commands, e-mail: dev-help@qpid.ap
> > > > > > > > > > > > ache
> > > > > > > > > > > > .org
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > -------------------------------------------------
> > > > > > > > > > > ----
> > > > > > > > > > > ------
> > > > > > > > > > > ------
> > > > > > > > > > > ----
> > > > > > > > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache
> > > > > > > > > > > .org
> > > > > > > > > > > For additional commands, e-mail: dev-help@qpid.apac
> > > > > > > > > > > he.o
> > > > > > > > > > > rg
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > -----------------------------------------------------
> > > > > > > > > ----
> > > > > > > > > ------
> > > > > > > > > ------
> > > > > > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> > > > > > > > > For additional commands, e-mail: dev-help@qpid.apache.o
> > > > > > > > > rg
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > > ---------------------------------------------------------
> > > > > > > ----
> > > > > > > --------
> > > > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> > > > > > > For additional commands, e-mail: dev-help@qpid.apache.org
> > > > > > >
> > > > > >
> > > > > >
> > > > > > -----------------------------------------------------------
> > > > > > ----
> > > > > > ------
> > > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> > > > > > For additional commands, e-mail: dev-help@qpid.apache.org
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > -K
> > > > >
> > > > > -------------------------------------------------------------
> > > > > ----
> > > > > ----
> > > > > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> > > > > For additional commands, e-mail: users-help@qpid.apache.org
> > > > >
> > > >
> > > > ---------------------------------------------------------------
> > > > ------
> > > > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> > > > For additional commands, e-mail: users-help@qpid.apache.org
> > > >
> > >
> > >
> > > -----------------------------------------------------------------
> > > ----
> > > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> > > For additional commands, e-mail: users-help@qpid.apache.org
> > >
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>
>

Re: API and terms: idle-time-out and heartbeat intervals.

Posted by Alan Conway <ac...@redhat.com>.

On Mon, 2016-10-17 at 17:08 +0100, Rob Godfrey wrote:
> If this API is not also making any decision on the (local) "timeout
> threshold" then that seems eminently reasonable (and what anyone with
> a
> passing familiarity of the spec would expect).
> 
> If the API is using the same call to also determine when the library
> will
> actually consider the peer has ceased to function, then I would be
> more
> concerned (albeit in an abstract way as I'm not actually currently
> using
> the API at all :-) ).
> 

Here's what I am considering for Go:

Called by the end initiating the connection:

// Heartbeat returns a ConnectionOption that requests the maximum delay
// between sending frames for the remote peer. If we don't receive any frames
// within 2*delay we will close the connection.
//
func Heartbeat(delay time.Duration) ConnectionOption {
	// Proton-C divides the idle-timeout by 2 before sending, so compensate.
	return func(c *connection) { c.engine.Transport().SetIdleTimeout(2 * delay) }
}

Called to query an incoming connection:

	// Heartbeat is the maximum delay between sending frames that the remote peer
	// has requested of us. If the interval expires an empty "heartbeat" frame
	// will be sent automatically to keep the connection open.
	Heartbeat() time.Duration

Key point is there is just *one* number that is passed to the API, sent
on the wire, and available to the other end, and it has a clearly
described meaning. The built-in automatic behavior is also described.

IMO using "idle timeout" to mean one thing at one end of the
connection, and a different thing at the other (and wait, which one
goes on the wire again?) is insane.

> -- Rob
> 
> On 17 October 2016 at 16:57, Alan Conway <ac...@redhat.com> wrote:
> 
> > 
> > On Fri, 2016-09-30 at 14:40 +0100, Rob Godfrey wrote:
> > > 
> > > I don't think it's a bug - it's a completely valid (if chatty)
> > > choice.
> > > 
> > > To my mind the semantics of the field are this:
> > > 
> > > The sender of the open frame is saying "if I do not receive any
> > > data
> > > for this period of time, I reserve the right to assume that you
> > > are
> > > no
> > > longer functioning correctly"
> > > On receiving this information, the peer should decide on a
> > > strategy
> > > that ensures to the best of its ability, that if it is
> > > functioning
> > > normally it will be generating data sufficiently frequently that
> > > the
> > > condition does not occur.��If this peer knows that it is
> > > susceptible
> > > to delays outside its direct control (such as garbage collection
> > > pauses) it should take account of this in how often it schedules
> > > sending heartbeat frames.
> > > At the same time, the sender of this value should give some
> > > leeway
> > > before actually shutting down to account for unexpected transport
> > > delays, or processing delays on its side.��Since it is unaware of
> > > the
> > > nature of the allowances made at the peer, it may decide to be
> > > particularly generous in the leeway it grants.
> > > 
> > > I think the spec mentioning a ratio (such as twice) is massively
> > > unhelpful.
> > > 
> > > In terms of the API - what is the applications intent in setting
> > > the
> > > value?��Is the intent to ensure transport activity within a
> > > specific
> > > period, or to set a hard limit on how long the library will wait
> > > before generating some sort of timeout exception?
> > 
> > My guess is that in terms of the API, the application's intent in
> > calling "set_idle_timeout(T)" is to set the formally-specified AMQP
> > value called "idle-timeout" to T. I don't think anybody's first
> > reading
> > of the API doc or the spec would suggest to them that if they want
> > to
> > set a wire timeout of T they actually need to call
> > set_idle_timeout(T*2) on one side of the connection and it will
> > magically come out as T on the other.
> > 
> > Spec wording aside, does anybody actually disagree with that?
> > 
> > 
> > > 
> > > -- Rob
> > > 
> > > 
> > > On 30 September 2016 at 14:25, Ken Giusti <kg...@redhat.com>
> > > wrote:
> > > > 
> > > > 
> > > > CC'ing user's list
> > > > 
> > > > Given the interpretations of the spec discussed below is the
> > > > way
> > > > Proton handles this functionality inconsistent with itself?
> > > > 
> > > > 1) Proton sends an open frame with _half_ the value of the
> > > > timeout
> > > > as set by the application.
> > > > 2) Proton interprets the idle-timeout in the received open as
> > > > the
> > > > actual time out interval, and pessimistically ;) generates
> > > > heartbeats as 1/2 that value.
> > > > 
> > > > Is this a bug in the Proton implementation?
> > > > 
> > > > -K
> > > > 
> > > > ----- Original Message -----
> > > > > 
> > > > > 
> > > > > From: "Alan Conway" <ac...@redhat.com>
> > > > > To: dev@qpid.apache.org
> > > > > Sent: Thursday, September 29, 2016 9:11:51 AM
> > > > > Subject: Re: API and terms: idle-time-out and heartbeat
> > > > > intervals.
> > > > > 
> > > > > On Wed, 2016-09-28 at 22:33 +0100, Robbie Gemmell wrote:
> > > > > > 
> > > > > > 
> > > > > > The spec unfortunately says what it says, and even if there
> > > > > > were
> > > > > > scope
> > > > > > to update it, I'm not sure there is a way to change it to
> > > > > > address the
> > > > > > main issue (which regardless of any other clarity issues,
> > > > > > is I
> > > > > > think
> > > > > > just that it only says you SHOULD advertise half your
> > > > > > 'actual
> > > > > > timeout') that wouldn't result in an 'incompatible change'
> > > > > > of
> > > > > > behaviour.
> > > > > 
> > > > > I look at this differently. We agree on the semantics of
> > > > > idle-
> > > > > time-out
> > > > > defined by the spec. Now forget the spec wording, it is easy
> > > > > to
> > > > > explain
> > > > > clearly what the semantics are. It is not "half your actual
> > > > > time
> > > > > out",
> > > > > it is the *exact* frame rate you expect from your peer. The
> > > > > peer's
> > > > > *only* task is to respect that frame rate, they do not need
> > > > > any
> > > > > more
> > > > > information than that - they certainly don't need to guess at
> > > > > what your
> > > > > margin for error is for keeping the connection open.
> > > > > 
> > > > > The margin for error is an implementation detail that does
> > > > > *not*
> > > > > need
> > > > > to be advertised. Double the frame rate is a reasonable
> > > > > general
> > > > > recommendation, but the ideal margin depends on latency
> > > > > variability in
> > > > > the system, you can imagine systems where it might be better
> > > > > to
> > > > > have a
> > > > > larger or a smaller margin.
> > > > > 
> > > > > The name 'idle-time-out' is not an ideal choice, 'heartbeat'
> > > > > or
> > > > > 'max-
> > > > > frame-delay' might be better. But once we know what it means,
> > > > > the
> > > > > semantics are perfectly clear.
> > > > > 
> > > > > > 
> > > > > > 
> > > > > > Knowing the exact enforced timeout could certainly be
> > > > > > clearer
> > > > > > in some
> > > > > > ways, but if the period you had to send after wasnt defined
> > > > > > (e.g
> > > > > > half)
> > > > > > then it would essentially leave you with the same problem
> > > > > > as
> > > > > > now:
> > > > > > deciding exactly how early you should send heartbeat frames
> > > > > > to
> > > > > > avoid
> > > > > > a
> > > > > > [more-precisely-known] timeout.
> > > > > 
> > > > > The only important thing is to agree on the meaning of the
> > > > > advertised
> > > > > value: is it the frame rate or the connection close
> > > > > threshold?
> > > > > The work
> > > > > �of adjusting at one end or the other is equivalent.
> > > > > 
> > > > > I think we agree it is currently defined as the frame rate, I
> > > > > see
> > > > > no
> > > > > benefit in trying to change the meaning. (There would be no
> > > > > drawback
> > > > > either if the spec was not already published and implemented
> > > > > by
> > > > > multiple implementors, but it is)
> > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > With what the spec says we effectively have a lower-bound
> > > > > > on
> > > > > > the
> > > > > > actual-timeout (if cases the peer didnt half their actual-
> > > > > > timeout
> > > > > > before advertising a number) and an upper bound (if they
> > > > > > did)
> > > > > > due to
> > > > > > the use of SHOULD as opposed to MUST or some other more
> > > > > > fixed
> > > > > > definition of behaviour. So we currently try to send frames
> > > > > > often
> > > > > > enough to satisfy that lower bound, i.e the number we
> > > > > > received.
> > > > > > That
> > > > > > essentially reduces it to the the same problem as if we
> > > > > > were
> > > > > > trying
> > > > > > to
> > > > > > satisfy a theoretical exactly-known actual-timeout (just
> > > > > > using
> > > > > > a
> > > > > > smaller number that we dont want to use, because we
> > > > > > followed
> > > > > > the
> > > > > > specs
> > > > > > recommendation). Currently we do this by sending frames
> > > > > > after
> > > > > > half
> > > > > > the
> > > > > > advertised period, i.e. what we know is actually a quarter
> > > > > > of
> > > > > > the
> > > > > > actual-timeout enforced by peers such as proton which are
> > > > > > following
> > > > > > the specs recommendations. We could choose to do it less
> > > > > > often
> > > > > > than
> > > > > > that by reducing how conservative it is, e.g use 75+% of
> > > > > > advertised
> > > > > > value instead for example, which should have no impact in
> > > > > > the
> > > > > > cases
> > > > > > where peers had followed the recommendation (as we would
> > > > > > still
> > > > > > be
> > > > > > sending more than twice as quickly as needed to satisfy
> > > > > > their
> > > > > > actual-timeout), but would put us closer to spurious
> > > > > > timeout (
> > > > > > if e.g
> > > > > > there was a delay in delivering/processing the heartbeat)
> > > > > > against any
> > > > > > peers that didn't follow the recommendation.
> > > > > > 
> > > > > > On 28 September 2016 at 21:26, Justin Ross <justin.ross@gma
> > > > > > il.c
> > > > > > om>
> > > > > > wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > IMO, the overall picture is simpler, and easier to
> > > > > > > explain to
> > > > > > > third
> > > > > > > parties, if we go the way Ken suggested.��When a remote
> > > > > > > peer
> > > > > > > sends
> > > > > > > you an
> > > > > > > idle timeout value, it is an expression of an (actual,
> > > > > > > not
> > > > > > > simply
> > > > > > > "advertised") guarantee - "I will expire the connection
> > > > > > > after
> > > > > > > X
> > > > > > > time
> > > > > > > without receiving a frame from you".
> > > > > > > 
> > > > > > > We could also legitimately go the direction you
> > > > > > > suggest.��*But* its
> > > > > > > name is
> > > > > > > "idle timeout".��We can't easily change the name.��I
> > > > > > > think we
> > > > > > > should take
> > > > > > > the spec text that goes with the name, and the behavior
> > > > > > > of
> > > > > > > our
> > > > > > > components,
> > > > > > > firmly in the direction Ken suggests.
> > > > > > > 
> > > > > > > Off topic: why is this on the dev list?
> > > > > > > 
> > > > > > > 
> > > > > > > On Wed, Sep 28, 2016 at 12:36 PM, Alan Conway <aconway@re
> > > > > > > dhat
> > > > > > > .com>
> > > > > > > wrote:
> > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On Wed, 2016-09-28 at 10:13 -0400, Ken Giusti wrote:
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > I've had a hand in the way Proton/C interprets the
> > > > > > > > > meaning of
> > > > > > > > > 'idle-
> > > > > > > > > timeout' and I've never liked the solution.��I think
> > > > > > > > > Proton/C's
> > > > > > > > > behavior is not 'pessimistic' as much as it is
> > > > > > > > > 'conservative'
> > > > > > > > > for the
> > > > > > > > > sake of interoperability.��This, unfortunately ends
> > > > > > > > > up
> > > > > > > > > with a
> > > > > > > > > needless idle frame chattiness when both ends are
> > > > > > > > > Proton-
> > > > > > > > > based.
> > > > > > > > > 
> > > > > > > > > ----- Original Message -----
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > From: "Rob Godfrey" <ro...@gmail.com>
> > > > > > > > > > To: "qpid" <de...@qpid.apache.org>
> > > > > > > > > > Sent: Wednesday, September 28, 2016 6:19:05 AM
> > > > > > > > > > Subject: Re: API and terms: idle-time-out and
> > > > > > > > > > heartbeat
> > > > > > > > > > intervals.
> > > > > > > > > > 
> > > > > > > > > > I agree that specifying that the communicated
> > > > > > > > > > figure
> > > > > > > > > > should
> > > > > > > > > > be
> > > > > > > > > > "half"
> > > > > > > > > > the "actual" timeout was a mistake.
> > > > > > > > > > 
> > > > > > > > > > What the spec should have tried to communicate is
> > > > > > > > > > that
> > > > > > > > > > the
> > > > > > > > > > sender
> > > > > > > > > > should communicate a value somewhat less than the
> > > > > > > > > > period it
> > > > > > > > > > uses to
> > > > > > > > > > determine that the connection has actually timed-
> > > > > > > > > > out to
> > > > > > > > > > allow
> > > > > > > > > > for
> > > > > > > > > > the
> > > > > > > > > > receiver to process and emit a heartbeat frame.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Wouldn't it be much clearer to simply send the
> > > > > > > > > _actual_
> > > > > > > > > idle
> > > > > > > > > timeout
> > > > > > > > > value?
> > > > > > > > 
> > > > > > > > My read is that is exactly what it does: It sends the
> > > > > > > > max
> > > > > > > > time
> > > > > > > > that the
> > > > > > > > *sender* of frames may be idle. The receiver of frames
> > > > > > > > SHOULD be
> > > > > > > > more
> > > > > > > > patient than that. The wording of the "discussion"
> > > > > > > > around
> > > > > > > > it and
> > > > > > > > the
> > > > > > > > choice of terms is a bit cloudy but, the text that
> > > > > > > > describes
> > > > > > > > idle-time-
> > > > > > > > out seems clear enough: it is the max interval between
> > > > > > > > sending
> > > > > > > > frames.
> > > > > > > > The frame receiver SHOULD wait longer that that before
> > > > > > > > closing,
> > > > > > > > and 2x
> > > > > > > > seems a reasonable suggestion, but that's for the impl
> > > > > > > > to
> > > > > > > > decide.
> > > > > > > > 
> > > > > > > > It's weird that it says "idle-time-out should be half
> > > > > > > > the
> > > > > > > > threshold"
> > > > > > > > instead of "the threshold should be twice the idle-
> > > > > > > > time-
> > > > > > > > out" but
> > > > > > > > it's
> > > > > > > > logically equivalent.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Having the spec suggest "communicating a value
> > > > > > > > > *somewhat
> > > > > > > > > less*"
> > > > > > > > 
> > > > > > > > The wording is odd but the semantics are you
> > > > > > > > communicate
> > > > > > > > *exactly* the
> > > > > > > > max frame delay you want and then you SHOULD set your
> > > > > > > > connection
> > > > > > > > close
> > > > > > > > threshold to something bigger. The other end doesn't
> > > > > > > > need
> > > > > > > > to know
> > > > > > > > how
> > > > > > > > much bigger, they just need to know what rate to send
> > > > > > > > frames.
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > [emphasis mine] leaves the implementation open for
> > > > > > > > > interpretation -
> > > > > > > > > which is exactly how we got into this mess in the
> > > > > > > > > first
> > > > > > > > > place.��Developers are a smart bunch - they know that
> > > > > > > > > keep
> > > > > > > > > alive
> > > > > > > > > traffic will have to be sent frequently enough to
> > > > > > > > > prevent
> > > > > > > > > idle
> > > > > > > > > timeout.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > �Similarly the sender
> > > > > > > > > > should ensure that a frame has been emitted well
> > > > > > > > > > within
> > > > > > > > > > the
> > > > > > > > > > timeout
> > > > > > > > > > period to allow for any communication / processing
> > > > > > > > > > delay.
> > > > > > > > > 
> > > > > > > > > Agreed - perfectly acceptable for the spec to point
> > > > > > > > > this
> > > > > > > > > out.
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > �In practice
> > > > > > > > > > these "wiggle room" factors should not be
> > > > > > > > > > determined by
> > > > > > > > > > the
> > > > > > > > > > application level timeout setting but by sensible
> > > > > > > > > > calculations on
> > > > > > > > > > transport delay variance / processing time,
> > > > > > > > > > etc...��these
> > > > > > > > > > calculation
> > > > > > > > > > may differ between different use-cases /
> > > > > > > > > > environments
> > > > > > > > > > (for
> > > > > > > > > > example
> > > > > > > > > > in
> > > > > > > > > > a low latency / real-time environment you may be
> > > > > > > > > > able
> > > > > > > > > > to make
> > > > > > > > > > hard
> > > > > > > > > > guarantees about the number of milliseconds that
> > > > > > > > > > communication /
> > > > > > > > > > processing delay will take... on the other hand if
> > > > > > > > > > you
> > > > > > > > > > are
> > > > > > > > > > using an
> > > > > > > > > > interpreted language with stop-the-world garbage
> > > > > > > > > > collection
> > > > > > > > > > you may
> > > > > > > > > > not be able to say much better than the delay
> > > > > > > > > > should be
> > > > > > > > > > less
> > > > > > > > > > than
> > > > > > > > > > 30s
> > > > > > > > > > or whatever).
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Yes - very important things to keep in mind when
> > > > > > > > > implementing
> > > > > > > > > this.��But the spec shouldn't be making these
> > > > > > > > > suggestions
> > > > > > > > > for
> > > > > > > > > different implementation options. The spec should be
> > > > > > > > > as
> > > > > > > > > concise
> > > > > > > > > as
> > > > > > > > > possible about the mandated behavior, and leave the
> > > > > > > > > implementation to
> > > > > > > > > the developers.
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > I think application level APIs should be in terms
> > > > > > > > > > of
> > > > > > > > > > the
> > > > > > > > > > timeouts
> > > > > > > > > > that
> > > > > > > > > > will affect the application.��The AMQP library
> > > > > > > > > > should
> > > > > > > > > > be
> > > > > > > > > > massaging
> > > > > > > > > > those numbers in such a way that they can fulfil
> > > > > > > > > > the
> > > > > > > > > > application
> > > > > > > > > > requirements.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Agreed.��Now, is there _any_ way we can suggest an
> > > > > > > > > update
> > > > > > > > > to
> > > > > > > > > the
> > > > > > > > > spec?��Perhaps an errata, etc?
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > -- Rob
> > > > > > > > > > 
> > > > > > > > > > On 28 September 2016 at 10:42, Robbie Gemmell
> > > > > > > > > > <robbie.gemmell
> > > > > > > > > > @gmail
> > > > > > > > > > .com>
> > > > > > > > > > wrote:
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > On 27 September 2016 at 22:24, Alan Conway <aconw
> > > > > > > > > > > ay@r
> > > > > > > > > > > edhat.
> > > > > > > > > > > com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > On Tue, 2016-09-27 at 15:37 -0400, Alan Conway
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I want to clarify and document the meaning of
> > > > > > > > > > > > > these
> > > > > > > > > > > > > terms for
> > > > > > > > > > > > > our
> > > > > > > > > > > > > APIs,
> > > > > > > > > > > > > presently I can't find anywhere where they
> > > > > > > > > > > > > are
> > > > > > > > > > > > > documented
> > > > > > > > > > > > > clearly.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > The AMQP spec says: "Each peer has its own
> > > > > > > > > > > > > (independent) idle
> > > > > > > > > > > > > timeout.
> > > > > > > > > > > > > At connection open each peer communicates the
> > > > > > > > > > > > > maximum
> > > > > > > > > > > > > period between activity (frames) on the
> > > > > > > > > > > > > connection that
> > > > > > > > > > > > > it
> > > > > > > > > > > > > desires
> > > > > > > > > > > > > from
> > > > > > > > > > > > > its partner.The open frame carries the
> > > > > > > > > > > > > idletime-
> > > > > > > > > > > > > out
> > > > > > > > > > > > > field for this purpose. To avoid spurious
> > > > > > > > > > > > > timeouts, the
> > > > > > > > > > > > > value
> > > > > > > > > > > > > in
> > > > > > > > > > > > > idle-
> > > > > > > > > > > > > time-out SHOULD be half the peer\u2019s
> > > > > > > > > > > > > actual timeout threshold."
> > > > > > > > > > > > > 
> > > > > > > > > > > > > In other words: if I send you an "open" frame
> > > > > > > > > > > > > with
> > > > > > > > > > > > > idle-time-
> > > > > > > > > > > > > out=N
> > > > > > > > > > > > > that
> > > > > > > > > > > > > means *you* should not wait for longer than N
> > > > > > > > > > > > > milliseconds to
> > > > > > > > > > > > > send a
> > > > > > > > > > > > > frame to me. It does not mean *I* will close
> > > > > > > > > > > > > the
> > > > > > > > > > > > > connection
> > > > > > > > > > > > > after N
> > > > > > > > > > > > > milliseconds, I SHOULD be more patient and
> > > > > > > > > > > > > wait
> > > > > > > > > > > > > for N*2
> > > > > > > > > > > > > ms to
> > > > > > > > > > > > > avoid
> > > > > > > > > > > > > closing prematurely due to minor timing
> > > > > > > > > > > > > wobbles.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I think the choice of name is slightly
> > > > > > > > > > > > > ambiguous
> > > > > > > > > > > > > but
> > > > > > > > > > > > > the spec
> > > > > > > > > > > > > is
> > > > > > > > > > > > > clear
> > > > > > > > > > > > > on the semantics, so it's important to
> > > > > > > > > > > > > document
> > > > > > > > > > > > > it to
> > > > > > > > > > > > > remove
> > > > > > > > > > > > > the
> > > > > > > > > > > > > ambiguity.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Anybody disagree?
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > Sigh. Sadly proton-C interprets "idle-timeout"
> > > > > > > > > > > > differently
> > > > > > > > > > > > depending on
> > > > > > > > > > > > which end of the connection you are on:
> > > > > > > > > > > > 
> > > > > > > > > > > > ������// as per the recommendation in the spec,
> > > > > > > > > > > > advertise
> > > > > > > > > > > > half
> > > > > > > > > > > > our
> > > > > > > > > > > > ������// actual timeout to the remote
> > > > > > > > > > > > ������const pn_millis_t idle_timeout =
> > > > > > > > > > > > transport-
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > local_idle_timeout
> > > > > > > > > > > > ����������? (transport->local_idle_timeout/2)
> > > > > > > > > > > > ����������: 0;
> > > > > > > > > > > > 
> > > > > > > > > > > > So in proton, pn_set_idle_timeout does NOT mean
> > > > > > > > > > > > set
> > > > > > > > > > > > the
> > > > > > > > > > > > AMQP
> > > > > > > > > > > > idle-
> > > > > > > > > > > > timeout value, it means set the local "receive
> > > > > > > > > > > > timeout"
> > > > > > > > > > > > value
> > > > > > > > > > > > and send
> > > > > > > > > > > > half that as the AMQP "send timeout" for the
> > > > > > > > > > > > peer.
> > > > > > > > > > > > 
> > > > > > > > > > > > I'm tempted to use a new term in the Go API:
> > > > > > > > > > > > "heartbeat".
> > > > > > > > > > > > To me
> > > > > > > > > > > > that
> > > > > > > > > > > > clearly means the "send timeout" (hearts beat,
> > > > > > > > > > > > they
> > > > > > > > > > > > don't
> > > > > > > > > > > > listen for
> > > > > > > > > > > > beats) so it coincides with the meaning of the
> > > > > > > > > > > > AMQP
> > > > > > > > > > > > "idle-
> > > > > > > > > > > > timeout", but
> > > > > > > > > > > > without the ambiguity that is exacerbated by
> > > > > > > > > > > > proton
> > > > > > > > > > > > interpreting it
> > > > > > > > > > > > both ways.
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > Proton may seem to behave differently on each
> > > > > > > > > > > end,
> > > > > > > > > > > but I
> > > > > > > > > > > don't
> > > > > > > > > > > think
> > > > > > > > > > > its necessarily a bad thing that it does, and it
> > > > > > > > > > > is
> > > > > > > > > > > also I
> > > > > > > > > > > think
> > > > > > > > > > > largely just reflecting an annoying bit in the
> > > > > > > > > > > spec
> > > > > > > > > > > around
> > > > > > > > > > > this
> > > > > > > > > > > where
> > > > > > > > > > > different behaviours are allowed for, whereas it
> > > > > > > > > > > would be
> > > > > > > > > > > easier
> > > > > > > > > > > if it
> > > > > > > > > > > had less wiggle room.
> > > > > > > > > > > 
> > > > > > > > > > > The transport setter/getter for the local timeout
> > > > > > > > > > > takes the
> > > > > > > > > > > 'actual
> > > > > > > > > > > timeout' and then sends half of it as the
> > > > > > > > > > > advertised
> > > > > > > > > > > value
> > > > > > > > > > > in the
> > > > > > > > > > > Open
> > > > > > > > > > > sent. This makes a certain amount of sense since
> > > > > > > > > > > it
> > > > > > > > > > > ensures
> > > > > > > > > > > that
> > > > > > > > > > > appropriate behaviour is actually satisfied,
> > > > > > > > > > > rather
> > > > > > > > > > > than
> > > > > > > > > > > expecting the
> > > > > > > > > > > user to ensure they only give half the value they
> > > > > > > > > > > really
> > > > > > > > > > > want for
> > > > > > > > > > > their actual timeout. The getter for the remote
> > > > > > > > > > > timeout
> > > > > > > > > > > value on
> > > > > > > > > > > the
> > > > > > > > > > > other hand returns the advertised value from the
> > > > > > > > > > > Open
> > > > > > > > > > > that
> > > > > > > > > > > is
> > > > > > > > > > > received. I expect it does that since it cant
> > > > > > > > > > > actually ever
> > > > > > > > > > > return the
> > > > > > > > > > > remotes 'actual timeout' without making an
> > > > > > > > > > > assumption, i.e
> > > > > > > > > > > that
> > > > > > > > > > > they
> > > > > > > > > > > did in fact advertise half (or less) of their
> > > > > > > > > > > actual
> > > > > > > > > > > timeout,
> > > > > > > > > > > which
> > > > > > > > > > > the spec only says that they SHOULD do.
> > > > > > > > > > > 
> > > > > > > > > > > Yes the local setter taking the advertised value
> > > > > > > > > > > may
> > > > > > > > > > > have
> > > > > > > > > > > been
> > > > > > > > > > > better
> > > > > > > > > > > for method consistency with the remote getter. On
> > > > > > > > > > > the
> > > > > > > > > > > other
> > > > > > > > > > > hand,
> > > > > > > > > > > sending of necessary heartbeats is handled
> > > > > > > > > > > directly
> > > > > > > > > > > by the
> > > > > > > > > > > transport
> > > > > > > > > > > during the tick process, so users may not
> > > > > > > > > > > necessarily
> > > > > > > > > > > even
> > > > > > > > > > > use
> > > > > > > > > > > the
> > > > > > > > > > > getter themselves, and proton uses that remote
> > > > > > > > > > > value
> > > > > > > > > > > internally
> > > > > > > > > > > by
> > > > > > > > > > > pessimistically halfing it to account for the
> > > > > > > > > > > case
> > > > > > > > > > > that
> > > > > > > > > > > folks on
> > > > > > > > > > > the
> > > > > > > > > > > other end did not advertise half their actual
> > > > > > > > > > > timeout
> > > > > > > > > > > (since the
> > > > > > > > > > > spec
> > > > > > > > > > > doesnt require that they do). Side note: proton
> > > > > > > > > > > could
> > > > > > > > > > > arguably be
> > > > > > > > > > > less
> > > > > > > > > > > pessimistic here and go for say a percentage much
> > > > > > > > > > > nearer
> > > > > > > > > > > the full
> > > > > > > > > > > advertised value, but then you'd probably need to
> > > > > > > > > > > start
> > > > > > > > > > > guaging
> > > > > > > > > > > how
> > > > > > > > > > > close is too close.
> > > > > > > > > > > 
> > > > > > > > > > > I think ensuring the doccumentation on the
> > > > > > > > > > > methods is
> > > > > > > > > > > clear
> > > > > > > > > > > what
> > > > > > > > > > > they
> > > > > > > > > > > do is sufficient enough here. I actually prefer
> > > > > > > > > > > idle-
> > > > > > > > > > > timeout as
> > > > > > > > > > > an
> > > > > > > > > > > name rather than heartbeat due to the way this
> > > > > > > > > > > all
> > > > > > > > > > > works.
> > > > > > > > > > > Since
> > > > > > > > > > > you
> > > > > > > > > > > only tell the other side [half] your timeout, you
> > > > > > > > > > > dont
> > > > > > > > > > > actually
> > > > > > > > > > > have
> > > > > > > > > > > direct control over when they send any needed
> > > > > > > > > > > empty
> > > > > > > > > > > frames
> > > > > > > > > > > to
> > > > > > > > > > > satisfy
> > > > > > > > > > > it (as the above shows, we might send them more
> > > > > > > > > > > often
> > > > > > > > > > > than
> > > > > > > > > > > they
> > > > > > > > > > > require) and 'heartbeat' might seem to imply that
> > > > > > > > > > > you
> > > > > > > > > > > do,
> > > > > > > > > > > and
> > > > > > > > > > > possibly
> > > > > > > > > > > even that they need be sent at that period all
> > > > > > > > > > > the
> > > > > > > > > > > time
> > > > > > > > > > > even
> > > > > > > > > > > despite
> > > > > > > > > > > regular traffic, which is not the case.
> > > > > > > > > > > 
> > > > > > > > > > > Robbie
> > > > > > > > > > > 
> > > > > > > > > > > -----------------------------------------------
> > > > > > > > > > > ----
> > > > > > > > > > > ------
> > > > > > > > > > > ------
> > > > > > > > > > > ------
> > > > > > > > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apac
> > > > > > > > > > > he.o
> > > > > > > > > > > rg
> > > > > > > > > > > For additional commands, e-mail: dev-help@qpid.ap
> > > > > > > > > > > ache
> > > > > > > > > > > .org
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > -------------------------------------------------
> > > > > > > > > > ----
> > > > > > > > > > ------
> > > > > > > > > > ------
> > > > > > > > > > ----
> > > > > > > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache
> > > > > > > > > > .org
> > > > > > > > > > For additional commands, e-mail: dev-help@qpid.apac
> > > > > > > > > > he.o
> > > > > > > > > > rg
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > -----------------------------------------------------
> > > > > > > > ----
> > > > > > > > ------
> > > > > > > > ------
> > > > > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> > > > > > > > For additional commands, e-mail: dev-help@qpid.apache.o
> > > > > > > > rg
> > > > > > > > 
> > > > > > > > 
> > > > > > 
> > > > > > ---------------------------------------------------------
> > > > > > ----
> > > > > > --------
> > > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> > > > > > For additional commands, e-mail: dev-help@qpid.apache.org
> > > > > > 
> > > > > 
> > > > > 
> > > > > -----------------------------------------------------------
> > > > > ----
> > > > > ------
> > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> > > > > For additional commands, e-mail: dev-help@qpid.apache.org
> > > > > 
> > > > > 
> > > > 
> > > > --
> > > > -K
> > > > 
> > > > -------------------------------------------------------------
> > > > ----
> > > > ----
> > > > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> > > > For additional commands, e-mail: users-help@qpid.apache.org
> > > > 
> > > 
> > > ---------------------------------------------------------------
> > > ------
> > > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> > > For additional commands, e-mail: users-help@qpid.apache.org
> > > 
> > 
> > 
> > -----------------------------------------------------------------
> > ----
> > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> > For additional commands, e-mail: users-help@qpid.apache.org
> > 
> > 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

Re: API and terms: idle-time-out and heartbeat intervals.

Posted by Alan Conway <ac...@redhat.com>.

On Mon, 2016-10-17 at 17:08 +0100, Rob Godfrey wrote:
> If this API is not also making any decision on the (local) "timeout
> threshold" then that seems eminently reasonable (and what anyone with
> a
> passing familiarity of the spec would expect).
> 
> If the API is using the same call to also determine when the library
> will
> actually consider the peer has ceased to function, then I would be
> more
> concerned (albeit in an abstract way as I'm not actually currently
> using
> the API at all :-) ).
> 

Here's what I am considering for Go:

Called by the end initiating the connection:

// Heartbeat returns a ConnectionOption that requests the maximum delay
// between sending frames for the remote peer. If we don't receive any frames
// within 2*delay we will close the connection.
//
func Heartbeat(delay time.Duration) ConnectionOption {
	// Proton-C divides the idle-timeout by 2 before sending, so compensate.
	return func(c *connection) { c.engine.Transport().SetIdleTimeout(2 * delay) }
}

Called to query an incoming connection:

	// Heartbeat is the maximum delay between sending frames that the remote peer
	// has requested of us. If the interval expires an empty "heartbeat" frame
	// will be sent automatically to keep the connection open.
	Heartbeat() time.Duration

Key point is there is just *one* number that is passed to the API, sent
on the wire, and available to the other end, and it has a clearly
described meaning. The built-in automatic behavior is also described.

IMO using "idle timeout" to mean one thing at one end of the
connection, and a different thing at the other (and wait, which one
goes on the wire again?) is insane.

> -- Rob
> 
> On 17 October 2016 at 16:57, Alan Conway <ac...@redhat.com> wrote:
> 
> > 
> > On Fri, 2016-09-30 at 14:40 +0100, Rob Godfrey wrote:
> > > 
> > > I don't think it's a bug - it's a completely valid (if chatty)
> > > choice.
> > > 
> > > To my mind the semantics of the field are this:
> > > 
> > > The sender of the open frame is saying "if I do not receive any
> > > data
> > > for this period of time, I reserve the right to assume that you
> > > are
> > > no
> > > longer functioning correctly"
> > > On receiving this information, the peer should decide on a
> > > strategy
> > > that ensures to the best of its ability, that if it is
> > > functioning
> > > normally it will be generating data sufficiently frequently that
> > > the
> > > condition does not occur.��If this peer knows that it is
> > > susceptible
> > > to delays outside its direct control (such as garbage collection
> > > pauses) it should take account of this in how often it schedules
> > > sending heartbeat frames.
> > > At the same time, the sender of this value should give some
> > > leeway
> > > before actually shutting down to account for unexpected transport
> > > delays, or processing delays on its side.��Since it is unaware of
> > > the
> > > nature of the allowances made at the peer, it may decide to be
> > > particularly generous in the leeway it grants.
> > > 
> > > I think the spec mentioning a ratio (such as twice) is massively
> > > unhelpful.
> > > 
> > > In terms of the API - what is the applications intent in setting
> > > the
> > > value?��Is the intent to ensure transport activity within a
> > > specific
> > > period, or to set a hard limit on how long the library will wait
> > > before generating some sort of timeout exception?
> > 
> > My guess is that in terms of the API, the application's intent in
> > calling "set_idle_timeout(T)" is to set the formally-specified AMQP
> > value called "idle-timeout" to T. I don't think anybody's first
> > reading
> > of the API doc or the spec would suggest to them that if they want
> > to
> > set a wire timeout of T they actually need to call
> > set_idle_timeout(T*2) on one side of the connection and it will
> > magically come out as T on the other.
> > 
> > Spec wording aside, does anybody actually disagree with that?
> > 
> > 
> > > 
> > > -- Rob
> > > 
> > > 
> > > On 30 September 2016 at 14:25, Ken Giusti <kg...@redhat.com>
> > > wrote:
> > > > 
> > > > 
> > > > CC'ing user's list
> > > > 
> > > > Given the interpretations of the spec discussed below is the
> > > > way
> > > > Proton handles this functionality inconsistent with itself?
> > > > 
> > > > 1) Proton sends an open frame with _half_ the value of the
> > > > timeout
> > > > as set by the application.
> > > > 2) Proton interprets the idle-timeout in the received open as
> > > > the
> > > > actual time out interval, and pessimistically ;) generates
> > > > heartbeats as 1/2 that value.
> > > > 
> > > > Is this a bug in the Proton implementation?
> > > > 
> > > > -K
> > > > 
> > > > ----- Original Message -----
> > > > > 
> > > > > 
> > > > > From: "Alan Conway" <ac...@redhat.com>
> > > > > To: dev@qpid.apache.org
> > > > > Sent: Thursday, September 29, 2016 9:11:51 AM
> > > > > Subject: Re: API and terms: idle-time-out and heartbeat
> > > > > intervals.
> > > > > 
> > > > > On Wed, 2016-09-28 at 22:33 +0100, Robbie Gemmell wrote:
> > > > > > 
> > > > > > 
> > > > > > The spec unfortunately says what it says, and even if there
> > > > > > were
> > > > > > scope
> > > > > > to update it, I'm not sure there is a way to change it to
> > > > > > address the
> > > > > > main issue (which regardless of any other clarity issues,
> > > > > > is I
> > > > > > think
> > > > > > just that it only says you SHOULD advertise half your
> > > > > > 'actual
> > > > > > timeout') that wouldn't result in an 'incompatible change'
> > > > > > of
> > > > > > behaviour.
> > > > > 
> > > > > I look at this differently. We agree on the semantics of
> > > > > idle-
> > > > > time-out
> > > > > defined by the spec. Now forget the spec wording, it is easy
> > > > > to
> > > > > explain
> > > > > clearly what the semantics are. It is not "half your actual
> > > > > time
> > > > > out",
> > > > > it is the *exact* frame rate you expect from your peer. The
> > > > > peer's
> > > > > *only* task is to respect that frame rate, they do not need
> > > > > any
> > > > > more
> > > > > information than that - they certainly don't need to guess at
> > > > > what your
> > > > > margin for error is for keeping the connection open.
> > > > > 
> > > > > The margin for error is an implementation detail that does
> > > > > *not*
> > > > > need
> > > > > to be advertised. Double the frame rate is a reasonable
> > > > > general
> > > > > recommendation, but the ideal margin depends on latency
> > > > > variability in
> > > > > the system, you can imagine systems where it might be better
> > > > > to
> > > > > have a
> > > > > larger or a smaller margin.
> > > > > 
> > > > > The name 'idle-time-out' is not an ideal choice, 'heartbeat'
> > > > > or
> > > > > 'max-
> > > > > frame-delay' might be better. But once we know what it means,
> > > > > the
> > > > > semantics are perfectly clear.
> > > > > 
> > > > > > 
> > > > > > 
> > > > > > Knowing the exact enforced timeout could certainly be
> > > > > > clearer
> > > > > > in some
> > > > > > ways, but if the period you had to send after wasnt defined
> > > > > > (e.g
> > > > > > half)
> > > > > > then it would essentially leave you with the same problem
> > > > > > as
> > > > > > now:
> > > > > > deciding exactly how early you should send heartbeat frames
> > > > > > to
> > > > > > avoid
> > > > > > a
> > > > > > [more-precisely-known] timeout.
> > > > > 
> > > > > The only important thing is to agree on the meaning of the
> > > > > advertised
> > > > > value: is it the frame rate or the connection close
> > > > > threshold?
> > > > > The work
> > > > > �of adjusting at one end or the other is equivalent.
> > > > > 
> > > > > I think we agree it is currently defined as the frame rate, I
> > > > > see
> > > > > no
> > > > > benefit in trying to change the meaning. (There would be no
> > > > > drawback
> > > > > either if the spec was not already published and implemented
> > > > > by
> > > > > multiple implementors, but it is)
> > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > With what the spec says we effectively have a lower-bound
> > > > > > on
> > > > > > the
> > > > > > actual-timeout (if cases the peer didnt half their actual-
> > > > > > timeout
> > > > > > before advertising a number) and an upper bound (if they
> > > > > > did)
> > > > > > due to
> > > > > > the use of SHOULD as opposed to MUST or some other more
> > > > > > fixed
> > > > > > definition of behaviour. So we currently try to send frames
> > > > > > often
> > > > > > enough to satisfy that lower bound, i.e the number we
> > > > > > received.
> > > > > > That
> > > > > > essentially reduces it to the the same problem as if we
> > > > > > were
> > > > > > trying
> > > > > > to
> > > > > > satisfy a theoretical exactly-known actual-timeout (just
> > > > > > using
> > > > > > a
> > > > > > smaller number that we dont want to use, because we
> > > > > > followed
> > > > > > the
> > > > > > specs
> > > > > > recommendation). Currently we do this by sending frames
> > > > > > after
> > > > > > half
> > > > > > the
> > > > > > advertised period, i.e. what we know is actually a quarter
> > > > > > of
> > > > > > the
> > > > > > actual-timeout enforced by peers such as proton which are
> > > > > > following
> > > > > > the specs recommendations. We could choose to do it less
> > > > > > often
> > > > > > than
> > > > > > that by reducing how conservative it is, e.g use 75+% of
> > > > > > advertised
> > > > > > value instead for example, which should have no impact in
> > > > > > the
> > > > > > cases
> > > > > > where peers had followed the recommendation (as we would
> > > > > > still
> > > > > > be
> > > > > > sending more than twice as quickly as needed to satisfy
> > > > > > their
> > > > > > actual-timeout), but would put us closer to spurious
> > > > > > timeout (
> > > > > > if e.g
> > > > > > there was a delay in delivering/processing the heartbeat)
> > > > > > against any
> > > > > > peers that didn't follow the recommendation.
> > > > > > 
> > > > > > On 28 September 2016 at 21:26, Justin Ross <justin.ross@gma
> > > > > > il.c
> > > > > > om>
> > > > > > wrote:
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > IMO, the overall picture is simpler, and easier to
> > > > > > > explain to
> > > > > > > third
> > > > > > > parties, if we go the way Ken suggested.��When a remote
> > > > > > > peer
> > > > > > > sends
> > > > > > > you an
> > > > > > > idle timeout value, it is an expression of an (actual,
> > > > > > > not
> > > > > > > simply
> > > > > > > "advertised") guarantee - "I will expire the connection
> > > > > > > after
> > > > > > > X
> > > > > > > time
> > > > > > > without receiving a frame from you".
> > > > > > > 
> > > > > > > We could also legitimately go the direction you
> > > > > > > suggest.��*But* its
> > > > > > > name is
> > > > > > > "idle timeout".��We can't easily change the name.��I
> > > > > > > think we
> > > > > > > should take
> > > > > > > the spec text that goes with the name, and the behavior
> > > > > > > of
> > > > > > > our
> > > > > > > components,
> > > > > > > firmly in the direction Ken suggests.
> > > > > > > 
> > > > > > > Off topic: why is this on the dev list?
> > > > > > > 
> > > > > > > 
> > > > > > > On Wed, Sep 28, 2016 at 12:36 PM, Alan Conway <aconway@re
> > > > > > > dhat
> > > > > > > .com>
> > > > > > > wrote:
> > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On Wed, 2016-09-28 at 10:13 -0400, Ken Giusti wrote:
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > I've had a hand in the way Proton/C interprets the
> > > > > > > > > meaning of
> > > > > > > > > 'idle-
> > > > > > > > > timeout' and I've never liked the solution.��I think
> > > > > > > > > Proton/C's
> > > > > > > > > behavior is not 'pessimistic' as much as it is
> > > > > > > > > 'conservative'
> > > > > > > > > for the
> > > > > > > > > sake of interoperability.��This, unfortunately ends
> > > > > > > > > up
> > > > > > > > > with a
> > > > > > > > > needless idle frame chattiness when both ends are
> > > > > > > > > Proton-
> > > > > > > > > based.
> > > > > > > > > 
> > > > > > > > > ----- Original Message -----
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > From: "Rob Godfrey" <ro...@gmail.com>
> > > > > > > > > > To: "qpid" <de...@qpid.apache.org>
> > > > > > > > > > Sent: Wednesday, September 28, 2016 6:19:05 AM
> > > > > > > > > > Subject: Re: API and terms: idle-time-out and
> > > > > > > > > > heartbeat
> > > > > > > > > > intervals.
> > > > > > > > > > 
> > > > > > > > > > I agree that specifying that the communicated
> > > > > > > > > > figure
> > > > > > > > > > should
> > > > > > > > > > be
> > > > > > > > > > "half"
> > > > > > > > > > the "actual" timeout was a mistake.
> > > > > > > > > > 
> > > > > > > > > > What the spec should have tried to communicate is
> > > > > > > > > > that
> > > > > > > > > > the
> > > > > > > > > > sender
> > > > > > > > > > should communicate a value somewhat less than the
> > > > > > > > > > period it
> > > > > > > > > > uses to
> > > > > > > > > > determine that the connection has actually timed-
> > > > > > > > > > out to
> > > > > > > > > > allow
> > > > > > > > > > for
> > > > > > > > > > the
> > > > > > > > > > receiver to process and emit a heartbeat frame.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Wouldn't it be much clearer to simply send the
> > > > > > > > > _actual_
> > > > > > > > > idle
> > > > > > > > > timeout
> > > > > > > > > value?
> > > > > > > > 
> > > > > > > > My read is that is exactly what it does: It sends the
> > > > > > > > max
> > > > > > > > time
> > > > > > > > that the
> > > > > > > > *sender* of frames may be idle. The receiver of frames
> > > > > > > > SHOULD be
> > > > > > > > more
> > > > > > > > patient than that. The wording of the "discussion"
> > > > > > > > around
> > > > > > > > it and
> > > > > > > > the
> > > > > > > > choice of terms is a bit cloudy but, the text that
> > > > > > > > describes
> > > > > > > > idle-time-
> > > > > > > > out seems clear enough: it is the max interval between
> > > > > > > > sending
> > > > > > > > frames.
> > > > > > > > The frame receiver SHOULD wait longer that that before
> > > > > > > > closing,
> > > > > > > > and 2x
> > > > > > > > seems a reasonable suggestion, but that's for the impl
> > > > > > > > to
> > > > > > > > decide.
> > > > > > > > 
> > > > > > > > It's weird that it says "idle-time-out should be half
> > > > > > > > the
> > > > > > > > threshold"
> > > > > > > > instead of "the threshold should be twice the idle-
> > > > > > > > time-
> > > > > > > > out" but
> > > > > > > > it's
> > > > > > > > logically equivalent.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Having the spec suggest "communicating a value
> > > > > > > > > *somewhat
> > > > > > > > > less*"
> > > > > > > > 
> > > > > > > > The wording is odd but the semantics are you
> > > > > > > > communicate
> > > > > > > > *exactly* the
> > > > > > > > max frame delay you want and then you SHOULD set your
> > > > > > > > connection
> > > > > > > > close
> > > > > > > > threshold to something bigger. The other end doesn't
> > > > > > > > need
> > > > > > > > to know
> > > > > > > > how
> > > > > > > > much bigger, they just need to know what rate to send
> > > > > > > > frames.
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > [emphasis mine] leaves the implementation open for
> > > > > > > > > interpretation -
> > > > > > > > > which is exactly how we got into this mess in the
> > > > > > > > > first
> > > > > > > > > place.��Developers are a smart bunch - they know that
> > > > > > > > > keep
> > > > > > > > > alive
> > > > > > > > > traffic will have to be sent frequently enough to
> > > > > > > > > prevent
> > > > > > > > > idle
> > > > > > > > > timeout.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > �Similarly the sender
> > > > > > > > > > should ensure that a frame has been emitted well
> > > > > > > > > > within
> > > > > > > > > > the
> > > > > > > > > > timeout
> > > > > > > > > > period to allow for any communication / processing
> > > > > > > > > > delay.
> > > > > > > > > 
> > > > > > > > > Agreed - perfectly acceptable for the spec to point
> > > > > > > > > this
> > > > > > > > > out.
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > �In practice
> > > > > > > > > > these "wiggle room" factors should not be
> > > > > > > > > > determined by
> > > > > > > > > > the
> > > > > > > > > > application level timeout setting but by sensible
> > > > > > > > > > calculations on
> > > > > > > > > > transport delay variance / processing time,
> > > > > > > > > > etc...��these
> > > > > > > > > > calculation
> > > > > > > > > > may differ between different use-cases /
> > > > > > > > > > environments
> > > > > > > > > > (for
> > > > > > > > > > example
> > > > > > > > > > in
> > > > > > > > > > a low latency / real-time environment you may be
> > > > > > > > > > able
> > > > > > > > > > to make
> > > > > > > > > > hard
> > > > > > > > > > guarantees about the number of milliseconds that
> > > > > > > > > > communication /
> > > > > > > > > > processing delay will take... on the other hand if
> > > > > > > > > > you
> > > > > > > > > > are
> > > > > > > > > > using an
> > > > > > > > > > interpreted language with stop-the-world garbage
> > > > > > > > > > collection
> > > > > > > > > > you may
> > > > > > > > > > not be able to say much better than the delay
> > > > > > > > > > should be
> > > > > > > > > > less
> > > > > > > > > > than
> > > > > > > > > > 30s
> > > > > > > > > > or whatever).
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Yes - very important things to keep in mind when
> > > > > > > > > implementing
> > > > > > > > > this.��But the spec shouldn't be making these
> > > > > > > > > suggestions
> > > > > > > > > for
> > > > > > > > > different implementation options. The spec should be
> > > > > > > > > as
> > > > > > > > > concise
> > > > > > > > > as
> > > > > > > > > possible about the mandated behavior, and leave the
> > > > > > > > > implementation to
> > > > > > > > > the developers.
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > I think application level APIs should be in terms
> > > > > > > > > > of
> > > > > > > > > > the
> > > > > > > > > > timeouts
> > > > > > > > > > that
> > > > > > > > > > will affect the application.��The AMQP library
> > > > > > > > > > should
> > > > > > > > > > be
> > > > > > > > > > massaging
> > > > > > > > > > those numbers in such a way that they can fulfil
> > > > > > > > > > the
> > > > > > > > > > application
> > > > > > > > > > requirements.
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Agreed.��Now, is there _any_ way we can suggest an
> > > > > > > > > update
> > > > > > > > > to
> > > > > > > > > the
> > > > > > > > > spec?��Perhaps an errata, etc?
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > -- Rob
> > > > > > > > > > 
> > > > > > > > > > On 28 September 2016 at 10:42, Robbie Gemmell
> > > > > > > > > > <robbie.gemmell
> > > > > > > > > > @gmail
> > > > > > > > > > .com>
> > > > > > > > > > wrote:
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > On 27 September 2016 at 22:24, Alan Conway <aconw
> > > > > > > > > > > ay@r
> > > > > > > > > > > edhat.
> > > > > > > > > > > com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > On Tue, 2016-09-27 at 15:37 -0400, Alan Conway
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I want to clarify and document the meaning of
> > > > > > > > > > > > > these
> > > > > > > > > > > > > terms for
> > > > > > > > > > > > > our
> > > > > > > > > > > > > APIs,
> > > > > > > > > > > > > presently I can't find anywhere where they
> > > > > > > > > > > > > are
> > > > > > > > > > > > > documented
> > > > > > > > > > > > > clearly.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > The AMQP spec says: "Each peer has its own
> > > > > > > > > > > > > (independent) idle
> > > > > > > > > > > > > timeout.
> > > > > > > > > > > > > At connection open each peer communicates the
> > > > > > > > > > > > > maximum
> > > > > > > > > > > > > period between activity (frames) on the
> > > > > > > > > > > > > connection that
> > > > > > > > > > > > > it
> > > > > > > > > > > > > desires
> > > > > > > > > > > > > from
> > > > > > > > > > > > > its partner.The open frame carries the
> > > > > > > > > > > > > idletime-
> > > > > > > > > > > > > out
> > > > > > > > > > > > > field for this purpose. To avoid spurious
> > > > > > > > > > > > > timeouts, the
> > > > > > > > > > > > > value
> > > > > > > > > > > > > in
> > > > > > > > > > > > > idle-
> > > > > > > > > > > > > time-out SHOULD be half the peer\u2019s
> > > > > > > > > > > > > actual timeout threshold."
> > > > > > > > > > > > > 
> > > > > > > > > > > > > In other words: if I send you an "open" frame
> > > > > > > > > > > > > with
> > > > > > > > > > > > > idle-time-
> > > > > > > > > > > > > out=N
> > > > > > > > > > > > > that
> > > > > > > > > > > > > means *you* should not wait for longer than N
> > > > > > > > > > > > > milliseconds to
> > > > > > > > > > > > > send a
> > > > > > > > > > > > > frame to me. It does not mean *I* will close
> > > > > > > > > > > > > the
> > > > > > > > > > > > > connection
> > > > > > > > > > > > > after N
> > > > > > > > > > > > > milliseconds, I SHOULD be more patient and
> > > > > > > > > > > > > wait
> > > > > > > > > > > > > for N*2
> > > > > > > > > > > > > ms to
> > > > > > > > > > > > > avoid
> > > > > > > > > > > > > closing prematurely due to minor timing
> > > > > > > > > > > > > wobbles.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I think the choice of name is slightly
> > > > > > > > > > > > > ambiguous
> > > > > > > > > > > > > but
> > > > > > > > > > > > > the spec
> > > > > > > > > > > > > is
> > > > > > > > > > > > > clear
> > > > > > > > > > > > > on the semantics, so it's important to
> > > > > > > > > > > > > document
> > > > > > > > > > > > > it to
> > > > > > > > > > > > > remove
> > > > > > > > > > > > > the
> > > > > > > > > > > > > ambiguity.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Anybody disagree?
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > Sigh. Sadly proton-C interprets "idle-timeout"
> > > > > > > > > > > > differently
> > > > > > > > > > > > depending on
> > > > > > > > > > > > which end of the connection you are on:
> > > > > > > > > > > > 
> > > > > > > > > > > > ������// as per the recommendation in the spec,
> > > > > > > > > > > > advertise
> > > > > > > > > > > > half
> > > > > > > > > > > > our
> > > > > > > > > > > > ������// actual timeout to the remote
> > > > > > > > > > > > ������const pn_millis_t idle_timeout =
> > > > > > > > > > > > transport-
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > 
> > > > > > > > > > > > > local_idle_timeout
> > > > > > > > > > > > ����������? (transport->local_idle_timeout/2)
> > > > > > > > > > > > ����������: 0;
> > > > > > > > > > > > 
> > > > > > > > > > > > So in proton, pn_set_idle_timeout does NOT mean
> > > > > > > > > > > > set
> > > > > > > > > > > > the
> > > > > > > > > > > > AMQP
> > > > > > > > > > > > idle-
> > > > > > > > > > > > timeout value, it means set the local "receive
> > > > > > > > > > > > timeout"
> > > > > > > > > > > > value
> > > > > > > > > > > > and send
> > > > > > > > > > > > half that as the AMQP "send timeout" for the
> > > > > > > > > > > > peer.
> > > > > > > > > > > > 
> > > > > > > > > > > > I'm tempted to use a new term in the Go API:
> > > > > > > > > > > > "heartbeat".
> > > > > > > > > > > > To me
> > > > > > > > > > > > that
> > > > > > > > > > > > clearly means the "send timeout" (hearts beat,
> > > > > > > > > > > > they
> > > > > > > > > > > > don't
> > > > > > > > > > > > listen for
> > > > > > > > > > > > beats) so it coincides with the meaning of the
> > > > > > > > > > > > AMQP
> > > > > > > > > > > > "idle-
> > > > > > > > > > > > timeout", but
> > > > > > > > > > > > without the ambiguity that is exacerbated by
> > > > > > > > > > > > proton
> > > > > > > > > > > > interpreting it
> > > > > > > > > > > > both ways.
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > Proton may seem to behave differently on each
> > > > > > > > > > > end,
> > > > > > > > > > > but I
> > > > > > > > > > > don't
> > > > > > > > > > > think
> > > > > > > > > > > its necessarily a bad thing that it does, and it
> > > > > > > > > > > is
> > > > > > > > > > > also I
> > > > > > > > > > > think
> > > > > > > > > > > largely just reflecting an annoying bit in the
> > > > > > > > > > > spec
> > > > > > > > > > > around
> > > > > > > > > > > this
> > > > > > > > > > > where
> > > > > > > > > > > different behaviours are allowed for, whereas it
> > > > > > > > > > > would be
> > > > > > > > > > > easier
> > > > > > > > > > > if it
> > > > > > > > > > > had less wiggle room.
> > > > > > > > > > > 
> > > > > > > > > > > The transport setter/getter for the local timeout
> > > > > > > > > > > takes the
> > > > > > > > > > > 'actual
> > > > > > > > > > > timeout' and then sends half of it as the
> > > > > > > > > > > advertised
> > > > > > > > > > > value
> > > > > > > > > > > in the
> > > > > > > > > > > Open
> > > > > > > > > > > sent. This makes a certain amount of sense since
> > > > > > > > > > > it
> > > > > > > > > > > ensures
> > > > > > > > > > > that
> > > > > > > > > > > appropriate behaviour is actually satisfied,
> > > > > > > > > > > rather
> > > > > > > > > > > than
> > > > > > > > > > > expecting the
> > > > > > > > > > > user to ensure they only give half the value they
> > > > > > > > > > > really
> > > > > > > > > > > want for
> > > > > > > > > > > their actual timeout. The getter for the remote
> > > > > > > > > > > timeout
> > > > > > > > > > > value on
> > > > > > > > > > > the
> > > > > > > > > > > other hand returns the advertised value from the
> > > > > > > > > > > Open
> > > > > > > > > > > that
> > > > > > > > > > > is
> > > > > > > > > > > received. I expect it does that since it cant
> > > > > > > > > > > actually ever
> > > > > > > > > > > return the
> > > > > > > > > > > remotes 'actual timeout' without making an
> > > > > > > > > > > assumption, i.e
> > > > > > > > > > > that
> > > > > > > > > > > they
> > > > > > > > > > > did in fact advertise half (or less) of their
> > > > > > > > > > > actual
> > > > > > > > > > > timeout,
> > > > > > > > > > > which
> > > > > > > > > > > the spec only says that they SHOULD do.
> > > > > > > > > > > 
> > > > > > > > > > > Yes the local setter taking the advertised value
> > > > > > > > > > > may
> > > > > > > > > > > have
> > > > > > > > > > > been
> > > > > > > > > > > better
> > > > > > > > > > > for method consistency with the remote getter. On
> > > > > > > > > > > the
> > > > > > > > > > > other
> > > > > > > > > > > hand,
> > > > > > > > > > > sending of necessary heartbeats is handled
> > > > > > > > > > > directly
> > > > > > > > > > > by the
> > > > > > > > > > > transport
> > > > > > > > > > > during the tick process, so users may not
> > > > > > > > > > > necessarily
> > > > > > > > > > > even
> > > > > > > > > > > use
> > > > > > > > > > > the
> > > > > > > > > > > getter themselves, and proton uses that remote
> > > > > > > > > > > value
> > > > > > > > > > > internally
> > > > > > > > > > > by
> > > > > > > > > > > pessimistically halfing it to account for the
> > > > > > > > > > > case
> > > > > > > > > > > that
> > > > > > > > > > > folks on
> > > > > > > > > > > the
> > > > > > > > > > > other end did not advertise half their actual
> > > > > > > > > > > timeout
> > > > > > > > > > > (since the
> > > > > > > > > > > spec
> > > > > > > > > > > doesnt require that they do). Side note: proton
> > > > > > > > > > > could
> > > > > > > > > > > arguably be
> > > > > > > > > > > less
> > > > > > > > > > > pessimistic here and go for say a percentage much
> > > > > > > > > > > nearer
> > > > > > > > > > > the full
> > > > > > > > > > > advertised value, but then you'd probably need to
> > > > > > > > > > > start
> > > > > > > > > > > guaging
> > > > > > > > > > > how
> > > > > > > > > > > close is too close.
> > > > > > > > > > > 
> > > > > > > > > > > I think ensuring the doccumentation on the
> > > > > > > > > > > methods is
> > > > > > > > > > > clear
> > > > > > > > > > > what
> > > > > > > > > > > they
> > > > > > > > > > > do is sufficient enough here. I actually prefer
> > > > > > > > > > > idle-
> > > > > > > > > > > timeout as
> > > > > > > > > > > an
> > > > > > > > > > > name rather than heartbeat due to the way this
> > > > > > > > > > > all
> > > > > > > > > > > works.
> > > > > > > > > > > Since
> > > > > > > > > > > you
> > > > > > > > > > > only tell the other side [half] your timeout, you
> > > > > > > > > > > dont
> > > > > > > > > > > actually
> > > > > > > > > > > have
> > > > > > > > > > > direct control over when they send any needed
> > > > > > > > > > > empty
> > > > > > > > > > > frames
> > > > > > > > > > > to
> > > > > > > > > > > satisfy
> > > > > > > > > > > it (as the above shows, we might send them more
> > > > > > > > > > > often
> > > > > > > > > > > than
> > > > > > > > > > > they
> > > > > > > > > > > require) and 'heartbeat' might seem to imply that
> > > > > > > > > > > you
> > > > > > > > > > > do,
> > > > > > > > > > > and
> > > > > > > > > > > possibly
> > > > > > > > > > > even that they need be sent at that period all
> > > > > > > > > > > the
> > > > > > > > > > > time
> > > > > > > > > > > even
> > > > > > > > > > > despite
> > > > > > > > > > > regular traffic, which is not the case.
> > > > > > > > > > > 
> > > > > > > > > > > Robbie
> > > > > > > > > > > 
> > > > > > > > > > > -----------------------------------------------
> > > > > > > > > > > ----
> > > > > > > > > > > ------
> > > > > > > > > > > ------
> > > > > > > > > > > ------
> > > > > > > > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apac
> > > > > > > > > > > he.o
> > > > > > > > > > > rg
> > > > > > > > > > > For additional commands, e-mail: dev-help@qpid.ap
> > > > > > > > > > > ache
> > > > > > > > > > > .org
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > -------------------------------------------------
> > > > > > > > > > ----
> > > > > > > > > > ------
> > > > > > > > > > ------
> > > > > > > > > > ----
> > > > > > > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache
> > > > > > > > > > .org
> > > > > > > > > > For additional commands, e-mail: dev-help@qpid.apac
> > > > > > > > > > he.o
> > > > > > > > > > rg
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > -----------------------------------------------------
> > > > > > > > ----
> > > > > > > > ------
> > > > > > > > ------
> > > > > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> > > > > > > > For additional commands, e-mail: dev-help@qpid.apache.o
> > > > > > > > rg
> > > > > > > > 
> > > > > > > > 
> > > > > > 
> > > > > > ---------------------------------------------------------
> > > > > > ----
> > > > > > --------
> > > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> > > > > > For additional commands, e-mail: dev-help@qpid.apache.org
> > > > > > 
> > > > > 
> > > > > 
> > > > > -----------------------------------------------------------
> > > > > ----
> > > > > ------
> > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> > > > > For additional commands, e-mail: dev-help@qpid.apache.org
> > > > > 
> > > > > 
> > > > 
> > > > --
> > > > -K
> > > > 
> > > > -------------------------------------------------------------
> > > > ----
> > > > ----
> > > > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> > > > For additional commands, e-mail: users-help@qpid.apache.org
> > > > 
> > > 
> > > ---------------------------------------------------------------
> > > ------
> > > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> > > For additional commands, e-mail: users-help@qpid.apache.org
> > > 
> > 
> > 
> > -----------------------------------------------------------------
> > ----
> > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> > For additional commands, e-mail: users-help@qpid.apache.org
> > 
> > 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: API and terms: idle-time-out and heartbeat intervals.

Posted by Rob Godfrey <ro...@gmail.com>.

If this API is not also making any decision on the (local) "timeout
threshold" then that seems eminently reasonable (and what anyone with a
passing familiarity of the spec would expect).

If the API is using the same call to also determine when the library will
actually consider the peer has ceased to function, then I would be more
concerned (albeit in an abstract way as I'm not actually currently using
the API at all :-) ).

-- Rob

On 17 October 2016 at 16:57, Alan Conway <ac...@redhat.com> wrote:

> On Fri, 2016-09-30 at 14:40 +0100, Rob Godfrey wrote:
> > I don't think it's a bug - it's a completely valid (if chatty)
> > choice.
> >
> > To my mind the semantics of the field are this:
> >
> > The sender of the open frame is saying "if I do not receive any data
> > for this period of time, I reserve the right to assume that you are
> > no
> > longer functioning correctly"
> > On receiving this information, the peer should decide on a strategy
> > that ensures to the best of its ability, that if it is functioning
> > normally it will be generating data sufficiently frequently that the
> > condition does not occur.  If this peer knows that it is susceptible
> > to delays outside its direct control (such as garbage collection
> > pauses) it should take account of this in how often it schedules
> > sending heartbeat frames.
> > At the same time, the sender of this value should give some leeway
> > before actually shutting down to account for unexpected transport
> > delays, or processing delays on its side.  Since it is unaware of the
> > nature of the allowances made at the peer, it may decide to be
> > particularly generous in the leeway it grants.
> >
> > I think the spec mentioning a ratio (such as twice) is massively
> > unhelpful.
> >
> > In terms of the API - what is the applications intent in setting the
> > value?  Is the intent to ensure transport activity within a specific
> > period, or to set a hard limit on how long the library will wait
> > before generating some sort of timeout exception?
>
> My guess is that in terms of the API, the application's intent in
> calling "set_idle_timeout(T)" is to set the formally-specified AMQP
> value called "idle-timeout" to T. I don't think anybody's first reading
> of the API doc or the spec would suggest to them that if they want to
> set a wire timeout of T they actually need to call
> set_idle_timeout(T*2) on one side of the connection and it will
> magically come out as T on the other.
>
> Spec wording aside, does anybody actually disagree with that?
>
>
> > -- Rob
> >
> >
> > On 30 September 2016 at 14:25, Ken Giusti <kg...@redhat.com> wrote:
> > >
> > > CC'ing user's list
> > >
> > > Given the interpretations of the spec discussed below is the way
> > > Proton handles this functionality inconsistent with itself?
> > >
> > > 1) Proton sends an open frame with _half_ the value of the timeout
> > > as set by the application.
> > > 2) Proton interprets the idle-timeout in the received open as the
> > > actual time out interval, and pessimistically ;) generates
> > > heartbeats as 1/2 that value.
> > >
> > > Is this a bug in the Proton implementation?
> > >
> > > -K
> > >
> > > ----- Original Message -----
> > > >
> > > > From: "Alan Conway" <ac...@redhat.com>
> > > > To: dev@qpid.apache.org
> > > > Sent: Thursday, September 29, 2016 9:11:51 AM
> > > > Subject: Re: API and terms: idle-time-out and heartbeat
> > > > intervals.
> > > >
> > > > On Wed, 2016-09-28 at 22:33 +0100, Robbie Gemmell wrote:
> > > > >
> > > > > The spec unfortunately says what it says, and even if there
> > > > > were
> > > > > scope
> > > > > to update it, I'm not sure there is a way to change it to
> > > > > address the
> > > > > main issue (which regardless of any other clarity issues, is I
> > > > > think
> > > > > just that it only says you SHOULD advertise half your 'actual
> > > > > timeout') that wouldn't result in an 'incompatible change' of
> > > > > behaviour.
> > > >
> > > > I look at this differently. We agree on the semantics of idle-
> > > > time-out
> > > > defined by the spec. Now forget the spec wording, it is easy to
> > > > explain
> > > > clearly what the semantics are. It is not "half your actual time
> > > > out",
> > > > it is the *exact* frame rate you expect from your peer. The
> > > > peer's
> > > > *only* task is to respect that frame rate, they do not need any
> > > > more
> > > > information than that - they certainly don't need to guess at
> > > > what your
> > > > margin for error is for keeping the connection open.
> > > >
> > > > The margin for error is an implementation detail that does *not*
> > > > need
> > > > to be advertised. Double the frame rate is a reasonable general
> > > > recommendation, but the ideal margin depends on latency
> > > > variability in
> > > > the system, you can imagine systems where it might be better to
> > > > have a
> > > > larger or a smaller margin.
> > > >
> > > > The name 'idle-time-out' is not an ideal choice, 'heartbeat' or
> > > > 'max-
> > > > frame-delay' might be better. But once we know what it means, the
> > > > semantics are perfectly clear.
> > > >
> > > > >
> > > > > Knowing the exact enforced timeout could certainly be clearer
> > > > > in some
> > > > > ways, but if the period you had to send after wasnt defined
> > > > > (e.g
> > > > > half)
> > > > > then it would essentially leave you with the same problem as
> > > > > now:
> > > > > deciding exactly how early you should send heartbeat frames to
> > > > > avoid
> > > > > a
> > > > > [more-precisely-known] timeout.
> > > >
> > > > The only important thing is to agree on the meaning of the
> > > > advertised
> > > > value: is it the frame rate or the connection close threshold?
> > > > The work
> > > >  of adjusting at one end or the other is equivalent.
> > > >
> > > > I think we agree it is currently defined as the frame rate, I see
> > > > no
> > > > benefit in trying to change the meaning. (There would be no
> > > > drawback
> > > > either if the spec was not already published and implemented by
> > > > multiple implementors, but it is)
> > > >
> > > > >
> > > > >
> > > > > With what the spec says we effectively have a lower-bound on
> > > > > the
> > > > > actual-timeout (if cases the peer didnt half their actual-
> > > > > timeout
> > > > > before advertising a number) and an upper bound (if they did)
> > > > > due to
> > > > > the use of SHOULD as opposed to MUST or some other more fixed
> > > > > definition of behaviour. So we currently try to send frames
> > > > > often
> > > > > enough to satisfy that lower bound, i.e the number we received.
> > > > > That
> > > > > essentially reduces it to the the same problem as if we were
> > > > > trying
> > > > > to
> > > > > satisfy a theoretical exactly-known actual-timeout (just using
> > > > > a
> > > > > smaller number that we dont want to use, because we followed
> > > > > the
> > > > > specs
> > > > > recommendation). Currently we do this by sending frames after
> > > > > half
> > > > > the
> > > > > advertised period, i.e. what we know is actually a quarter of
> > > > > the
> > > > > actual-timeout enforced by peers such as proton which are
> > > > > following
> > > > > the specs recommendations. We could choose to do it less often
> > > > > than
> > > > > that by reducing how conservative it is, e.g use 75+% of
> > > > > advertised
> > > > > value instead for example, which should have no impact in the
> > > > > cases
> > > > > where peers had followed the recommendation (as we would still
> > > > > be
> > > > > sending more than twice as quickly as needed to satisfy their
> > > > > actual-timeout), but would put us closer to spurious timeout (
> > > > > if e.g
> > > > > there was a delay in delivering/processing the heartbeat)
> > > > > against any
> > > > > peers that didn't follow the recommendation.
> > > > >
> > > > > On 28 September 2016 at 21:26, Justin Ross <justin.ross@gmail.c
> > > > > om>
> > > > > wrote:
> > > > > >
> > > > > >
> > > > > > IMO, the overall picture is simpler, and easier to explain to
> > > > > > third
> > > > > > parties, if we go the way Ken suggested.  When a remote peer
> > > > > > sends
> > > > > > you an
> > > > > > idle timeout value, it is an expression of an (actual, not
> > > > > > simply
> > > > > > "advertised") guarantee - "I will expire the connection after
> > > > > > X
> > > > > > time
> > > > > > without receiving a frame from you".
> > > > > >
> > > > > > We could also legitimately go the direction you
> > > > > > suggest.  *But* its
> > > > > > name is
> > > > > > "idle timeout".  We can't easily change the name.  I think we
> > > > > > should take
> > > > > > the spec text that goes with the name, and the behavior of
> > > > > > our
> > > > > > components,
> > > > > > firmly in the direction Ken suggests.
> > > > > >
> > > > > > Off topic: why is this on the dev list?
> > > > > >
> > > > > >
> > > > > > On Wed, Sep 28, 2016 at 12:36 PM, Alan Conway <aconway@redhat
> > > > > > .com>
> > > > > > wrote:
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, 2016-09-28 at 10:13 -0400, Ken Giusti wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > I've had a hand in the way Proton/C interprets the
> > > > > > > > meaning of
> > > > > > > > 'idle-
> > > > > > > > timeout' and I've never liked the solution.  I think
> > > > > > > > Proton/C's
> > > > > > > > behavior is not 'pessimistic' as much as it is
> > > > > > > > 'conservative'
> > > > > > > > for the
> > > > > > > > sake of interoperability.  This, unfortunately ends up
> > > > > > > > with a
> > > > > > > > needless idle frame chattiness when both ends are Proton-
> > > > > > > > based.
> > > > > > > >
> > > > > > > > ----- Original Message -----
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > From: "Rob Godfrey" <ro...@gmail.com>
> > > > > > > > > To: "qpid" <de...@qpid.apache.org>
> > > > > > > > > Sent: Wednesday, September 28, 2016 6:19:05 AM
> > > > > > > > > Subject: Re: API and terms: idle-time-out and heartbeat
> > > > > > > > > intervals.
> > > > > > > > >
> > > > > > > > > I agree that specifying that the communicated figure
> > > > > > > > > should
> > > > > > > > > be
> > > > > > > > > "half"
> > > > > > > > > the "actual" timeout was a mistake.
> > > > > > > > >
> > > > > > > > > What the spec should have tried to communicate is that
> > > > > > > > > the
> > > > > > > > > sender
> > > > > > > > > should communicate a value somewhat less than the
> > > > > > > > > period it
> > > > > > > > > uses to
> > > > > > > > > determine that the connection has actually timed-out to
> > > > > > > > > allow
> > > > > > > > > for
> > > > > > > > > the
> > > > > > > > > receiver to process and emit a heartbeat frame.
> > > > > > > >
> > > > > > > >
> > > > > > > > Wouldn't it be much clearer to simply send the _actual_
> > > > > > > > idle
> > > > > > > > timeout
> > > > > > > > value?
> > > > > > >
> > > > > > > My read is that is exactly what it does: It sends the max
> > > > > > > time
> > > > > > > that the
> > > > > > > *sender* of frames may be idle. The receiver of frames
> > > > > > > SHOULD be
> > > > > > > more
> > > > > > > patient than that. The wording of the "discussion" around
> > > > > > > it and
> > > > > > > the
> > > > > > > choice of terms is a bit cloudy but, the text that
> > > > > > > describes
> > > > > > > idle-time-
> > > > > > > out seems clear enough: it is the max interval between
> > > > > > > sending
> > > > > > > frames.
> > > > > > > The frame receiver SHOULD wait longer that that before
> > > > > > > closing,
> > > > > > > and 2x
> > > > > > > seems a reasonable suggestion, but that's for the impl to
> > > > > > > decide.
> > > > > > >
> > > > > > > It's weird that it says "idle-time-out should be half the
> > > > > > > threshold"
> > > > > > > instead of "the threshold should be twice the idle-time-
> > > > > > > out" but
> > > > > > > it's
> > > > > > > logically equivalent.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Having the spec suggest "communicating a value *somewhat
> > > > > > > > less*"
> > > > > > >
> > > > > > > The wording is odd but the semantics are you communicate
> > > > > > > *exactly* the
> > > > > > > max frame delay you want and then you SHOULD set your
> > > > > > > connection
> > > > > > > close
> > > > > > > threshold to something bigger. The other end doesn't need
> > > > > > > to know
> > > > > > > how
> > > > > > > much bigger, they just need to know what rate to send
> > > > > > > frames.
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > [emphasis mine] leaves the implementation open for
> > > > > > > > interpretation -
> > > > > > > > which is exactly how we got into this mess in the first
> > > > > > > > place.  Developers are a smart bunch - they know that
> > > > > > > > keep
> > > > > > > > alive
> > > > > > > > traffic will have to be sent frequently enough to prevent
> > > > > > > > idle
> > > > > > > > timeout.
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >  Similarly the sender
> > > > > > > > > should ensure that a frame has been emitted well within
> > > > > > > > > the
> > > > > > > > > timeout
> > > > > > > > > period to allow for any communication / processing
> > > > > > > > > delay.
> > > > > > > >
> > > > > > > > Agreed - perfectly acceptable for the spec to point this
> > > > > > > > out.
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >  In practice
> > > > > > > > > these "wiggle room" factors should not be determined by
> > > > > > > > > the
> > > > > > > > > application level timeout setting but by sensible
> > > > > > > > > calculations on
> > > > > > > > > transport delay variance / processing time,
> > > > > > > > > etc...  these
> > > > > > > > > calculation
> > > > > > > > > may differ between different use-cases / environments
> > > > > > > > > (for
> > > > > > > > > example
> > > > > > > > > in
> > > > > > > > > a low latency / real-time environment you may be able
> > > > > > > > > to make
> > > > > > > > > hard
> > > > > > > > > guarantees about the number of milliseconds that
> > > > > > > > > communication /
> > > > > > > > > processing delay will take... on the other hand if you
> > > > > > > > > are
> > > > > > > > > using an
> > > > > > > > > interpreted language with stop-the-world garbage
> > > > > > > > > collection
> > > > > > > > > you may
> > > > > > > > > not be able to say much better than the delay should be
> > > > > > > > > less
> > > > > > > > > than
> > > > > > > > > 30s
> > > > > > > > > or whatever).
> > > > > > > > >
> > > > > > > >
> > > > > > > > Yes - very important things to keep in mind when
> > > > > > > > implementing
> > > > > > > > this.  But the spec shouldn't be making these suggestions
> > > > > > > > for
> > > > > > > > different implementation options. The spec should be as
> > > > > > > > concise
> > > > > > > > as
> > > > > > > > possible about the mandated behavior, and leave the
> > > > > > > > implementation to
> > > > > > > > the developers.
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I think application level APIs should be in terms of
> > > > > > > > > the
> > > > > > > > > timeouts
> > > > > > > > > that
> > > > > > > > > will affect the application.  The AMQP library should
> > > > > > > > > be
> > > > > > > > > massaging
> > > > > > > > > those numbers in such a way that they can fulfil the
> > > > > > > > > application
> > > > > > > > > requirements.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Agreed.  Now, is there _any_ way we can suggest an update
> > > > > > > > to
> > > > > > > > the
> > > > > > > > spec?  Perhaps an errata, etc?
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > -- Rob
> > > > > > > > >
> > > > > > > > > On 28 September 2016 at 10:42, Robbie Gemmell
> > > > > > > > > <robbie.gemmell
> > > > > > > > > @gmail
> > > > > > > > > .com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On 27 September 2016 at 22:24, Alan Conway <aconway@r
> > > > > > > > > > edhat.
> > > > > > > > > > com>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Tue, 2016-09-27 at 15:37 -0400, Alan Conway
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I want to clarify and document the meaning of
> > > > > > > > > > > > these
> > > > > > > > > > > > terms for
> > > > > > > > > > > > our
> > > > > > > > > > > > APIs,
> > > > > > > > > > > > presently I can't find anywhere where they are
> > > > > > > > > > > > documented
> > > > > > > > > > > > clearly.
> > > > > > > > > > > >
> > > > > > > > > > > > The AMQP spec says: "Each peer has its own
> > > > > > > > > > > > (independent) idle
> > > > > > > > > > > > timeout.
> > > > > > > > > > > > At connection open each peer communicates the
> > > > > > > > > > > > maximum
> > > > > > > > > > > > period between activity (frames) on the
> > > > > > > > > > > > connection that
> > > > > > > > > > > > it
> > > > > > > > > > > > desires
> > > > > > > > > > > > from
> > > > > > > > > > > > its partner.The open frame carries the idletime-
> > > > > > > > > > > > out
> > > > > > > > > > > > field for this purpose. To avoid spurious
> > > > > > > > > > > > timeouts, the
> > > > > > > > > > > > value
> > > > > > > > > > > > in
> > > > > > > > > > > > idle-
> > > > > > > > > > > > time-out SHOULD be half the peer’s
> > > > > > > > > > > > actual timeout threshold."
> > > > > > > > > > > >
> > > > > > > > > > > > In other words: if I send you an "open" frame
> > > > > > > > > > > > with
> > > > > > > > > > > > idle-time-
> > > > > > > > > > > > out=N
> > > > > > > > > > > > that
> > > > > > > > > > > > means *you* should not wait for longer than N
> > > > > > > > > > > > milliseconds to
> > > > > > > > > > > > send a
> > > > > > > > > > > > frame to me. It does not mean *I* will close the
> > > > > > > > > > > > connection
> > > > > > > > > > > > after N
> > > > > > > > > > > > milliseconds, I SHOULD be more patient and wait
> > > > > > > > > > > > for N*2
> > > > > > > > > > > > ms to
> > > > > > > > > > > > avoid
> > > > > > > > > > > > closing prematurely due to minor timing wobbles.
> > > > > > > > > > > >
> > > > > > > > > > > > I think the choice of name is slightly ambiguous
> > > > > > > > > > > > but
> > > > > > > > > > > > the spec
> > > > > > > > > > > > is
> > > > > > > > > > > > clear
> > > > > > > > > > > > on the semantics, so it's important to document
> > > > > > > > > > > > it to
> > > > > > > > > > > > remove
> > > > > > > > > > > > the
> > > > > > > > > > > > ambiguity.
> > > > > > > > > > > >
> > > > > > > > > > > > Anybody disagree?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Sigh. Sadly proton-C interprets "idle-timeout"
> > > > > > > > > > > differently
> > > > > > > > > > > depending on
> > > > > > > > > > > which end of the connection you are on:
> > > > > > > > > > >
> > > > > > > > > > >       // as per the recommendation in the spec,
> > > > > > > > > > > advertise
> > > > > > > > > > > half
> > > > > > > > > > > our
> > > > > > > > > > >       // actual timeout to the remote
> > > > > > > > > > >       const pn_millis_t idle_timeout = transport-
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > local_idle_timeout
> > > > > > > > > > >           ? (transport->local_idle_timeout/2)
> > > > > > > > > > >           : 0;
> > > > > > > > > > >
> > > > > > > > > > > So in proton, pn_set_idle_timeout does NOT mean set
> > > > > > > > > > > the
> > > > > > > > > > > AMQP
> > > > > > > > > > > idle-
> > > > > > > > > > > timeout value, it means set the local "receive
> > > > > > > > > > > timeout"
> > > > > > > > > > > value
> > > > > > > > > > > and send
> > > > > > > > > > > half that as the AMQP "send timeout" for the peer.
> > > > > > > > > > >
> > > > > > > > > > > I'm tempted to use a new term in the Go API:
> > > > > > > > > > > "heartbeat".
> > > > > > > > > > > To me
> > > > > > > > > > > that
> > > > > > > > > > > clearly means the "send timeout" (hearts beat, they
> > > > > > > > > > > don't
> > > > > > > > > > > listen for
> > > > > > > > > > > beats) so it coincides with the meaning of the AMQP
> > > > > > > > > > > "idle-
> > > > > > > > > > > timeout", but
> > > > > > > > > > > without the ambiguity that is exacerbated by proton
> > > > > > > > > > > interpreting it
> > > > > > > > > > > both ways.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Proton may seem to behave differently on each end,
> > > > > > > > > > but I
> > > > > > > > > > don't
> > > > > > > > > > think
> > > > > > > > > > its necessarily a bad thing that it does, and it is
> > > > > > > > > > also I
> > > > > > > > > > think
> > > > > > > > > > largely just reflecting an annoying bit in the spec
> > > > > > > > > > around
> > > > > > > > > > this
> > > > > > > > > > where
> > > > > > > > > > different behaviours are allowed for, whereas it
> > > > > > > > > > would be
> > > > > > > > > > easier
> > > > > > > > > > if it
> > > > > > > > > > had less wiggle room.
> > > > > > > > > >
> > > > > > > > > > The transport setter/getter for the local timeout
> > > > > > > > > > takes the
> > > > > > > > > > 'actual
> > > > > > > > > > timeout' and then sends half of it as the advertised
> > > > > > > > > > value
> > > > > > > > > > in the
> > > > > > > > > > Open
> > > > > > > > > > sent. This makes a certain amount of sense since it
> > > > > > > > > > ensures
> > > > > > > > > > that
> > > > > > > > > > appropriate behaviour is actually satisfied, rather
> > > > > > > > > > than
> > > > > > > > > > expecting the
> > > > > > > > > > user to ensure they only give half the value they
> > > > > > > > > > really
> > > > > > > > > > want for
> > > > > > > > > > their actual timeout. The getter for the remote
> > > > > > > > > > timeout
> > > > > > > > > > value on
> > > > > > > > > > the
> > > > > > > > > > other hand returns the advertised value from the Open
> > > > > > > > > > that
> > > > > > > > > > is
> > > > > > > > > > received. I expect it does that since it cant
> > > > > > > > > > actually ever
> > > > > > > > > > return the
> > > > > > > > > > remotes 'actual timeout' without making an
> > > > > > > > > > assumption, i.e
> > > > > > > > > > that
> > > > > > > > > > they
> > > > > > > > > > did in fact advertise half (or less) of their actual
> > > > > > > > > > timeout,
> > > > > > > > > > which
> > > > > > > > > > the spec only says that they SHOULD do.
> > > > > > > > > >
> > > > > > > > > > Yes the local setter taking the advertised value may
> > > > > > > > > > have
> > > > > > > > > > been
> > > > > > > > > > better
> > > > > > > > > > for method consistency with the remote getter. On the
> > > > > > > > > > other
> > > > > > > > > > hand,
> > > > > > > > > > sending of necessary heartbeats is handled directly
> > > > > > > > > > by the
> > > > > > > > > > transport
> > > > > > > > > > during the tick process, so users may not necessarily
> > > > > > > > > > even
> > > > > > > > > > use
> > > > > > > > > > the
> > > > > > > > > > getter themselves, and proton uses that remote value
> > > > > > > > > > internally
> > > > > > > > > > by
> > > > > > > > > > pessimistically halfing it to account for the case
> > > > > > > > > > that
> > > > > > > > > > folks on
> > > > > > > > > > the
> > > > > > > > > > other end did not advertise half their actual timeout
> > > > > > > > > > (since the
> > > > > > > > > > spec
> > > > > > > > > > doesnt require that they do). Side note: proton could
> > > > > > > > > > arguably be
> > > > > > > > > > less
> > > > > > > > > > pessimistic here and go for say a percentage much
> > > > > > > > > > nearer
> > > > > > > > > > the full
> > > > > > > > > > advertised value, but then you'd probably need to
> > > > > > > > > > start
> > > > > > > > > > guaging
> > > > > > > > > > how
> > > > > > > > > > close is too close.
> > > > > > > > > >
> > > > > > > > > > I think ensuring the doccumentation on the methods is
> > > > > > > > > > clear
> > > > > > > > > > what
> > > > > > > > > > they
> > > > > > > > > > do is sufficient enough here. I actually prefer idle-
> > > > > > > > > > timeout as
> > > > > > > > > > an
> > > > > > > > > > name rather than heartbeat due to the way this all
> > > > > > > > > > works.
> > > > > > > > > > Since
> > > > > > > > > > you
> > > > > > > > > > only tell the other side [half] your timeout, you
> > > > > > > > > > dont
> > > > > > > > > > actually
> > > > > > > > > > have
> > > > > > > > > > direct control over when they send any needed empty
> > > > > > > > > > frames
> > > > > > > > > > to
> > > > > > > > > > satisfy
> > > > > > > > > > it (as the above shows, we might send them more often
> > > > > > > > > > than
> > > > > > > > > > they
> > > > > > > > > > require) and 'heartbeat' might seem to imply that you
> > > > > > > > > > do,
> > > > > > > > > > and
> > > > > > > > > > possibly
> > > > > > > > > > even that they need be sent at that period all the
> > > > > > > > > > time
> > > > > > > > > > even
> > > > > > > > > > despite
> > > > > > > > > > regular traffic, which is not the case.
> > > > > > > > > >
> > > > > > > > > > Robbie
> > > > > > > > > >
> > > > > > > > > > ---------------------------------------------------
> > > > > > > > > > ------
> > > > > > > > > > ------
> > > > > > > > > > ------
> > > > > > > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.o
> > > > > > > > > > rg
> > > > > > > > > > For additional commands, e-mail: dev-help@qpid.apache
> > > > > > > > > > .org
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > -----------------------------------------------------
> > > > > > > > > ------
> > > > > > > > > ------
> > > > > > > > > ----
> > > > > > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> > > > > > > > > For additional commands, e-mail: dev-help@qpid.apache.o
> > > > > > > > > rg
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > ---------------------------------------------------------
> > > > > > > ------
> > > > > > > ------
> > > > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> > > > > > > For additional commands, e-mail: dev-help@qpid.apache.org
> > > > > > >
> > > > > > >
> > > > >
> > > > > -------------------------------------------------------------
> > > > > --------
> > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> > > > > For additional commands, e-mail: dev-help@qpid.apache.org
> > > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------
> > > > ------
> > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> > > > For additional commands, e-mail: dev-help@qpid.apache.org
> > > >
> > > >
> > >
> > > --
> > > -K
> > >
> > > -----------------------------------------------------------------
> > > ----
> > > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> > > For additional commands, e-mail: users-help@qpid.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> > For additional commands, e-mail: users-help@qpid.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>
>

Re: API and terms: idle-time-out and heartbeat intervals.

Posted by Ken Giusti <kg...@redhat.com>.

Robbie beat me to it...

In proton-c pn_transport_set_idle_timeout sets the actual local timeout interval.

The value of the Open frame's idle-timeout field is 1/2 the value set by pn_transport_set_idle_timeout, as suggested by the spec.

:/

https://git-wip-us.apache.org/repos/asf?p=qpid-proton.git;a=blob;f=proton-c/src/transport/transport.c;h=cdecfd291139acfb1c90698677abc0efc4056f50;hb=HEAD#l2539

----- Original Message -----
> From: "Robbie Gemmell" <ro...@gmail.com>
> To: dev@qpid.apache.org
> Cc: users@qpid.apache.org
> Sent: Monday, October 17, 2016 12:32:13 PM
> Subject: Re: API and terms: idle-time-out and heartbeat intervals.
> 
> On 17 October 2016 at 16:57, Alan Conway <ac...@redhat.com> wrote:
> > On Fri, 2016-09-30 at 14:40 +0100, Rob Godfrey wrote:
> >> I don't think it's a bug - it's a completely valid (if chatty)
> >> choice.
> >>
> >> To my mind the semantics of the field are this:
> >>
> >> The sender of the open frame is saying "if I do not receive any data
> >> for this period of time, I reserve the right to assume that you are
> >> no
> >> longer functioning correctly"
> >> On receiving this information, the peer should decide on a strategy
> >> that ensures to the best of its ability, that if it is functioning
> >> normally it will be generating data sufficiently frequently that the
> >> condition does not occur.  If this peer knows that it is susceptible
> >> to delays outside its direct control (such as garbage collection
> >> pauses) it should take account of this in how often it schedules
> >> sending heartbeat frames.
> >> At the same time, the sender of this value should give some leeway
> >> before actually shutting down to account for unexpected transport
> >> delays, or processing delays on its side.  Since it is unaware of the
> >> nature of the allowances made at the peer, it may decide to be
> >> particularly generous in the leeway it grants.
> >>
> >> I think the spec mentioning a ratio (such as twice) is massively
> >> unhelpful.
> >>
> >> In terms of the API - what is the applications intent in setting the
> >> value?  Is the intent to ensure transport activity within a specific
> >> period, or to set a hard limit on how long the library will wait
> >> before generating some sort of timeout exception?
> >
> > My guess is that in terms of the API, the application's intent in
> > calling "set_idle_timeout(T)" is to set the formally-specified AMQP
> > value called "idle-timeout" to T. I don't think anybody's first reading
> > of the API doc or the spec would suggest to them that if they want to
> > set a wire timeout of T they actually need to call
> > set_idle_timeout(T*2) on one side of the connection and it will
> > magically come out as T on the other.
> >
> > Spec wording aside, does anybody actually disagree with that?
> > <snip/>
> 
> If you dont consider the spec wording, I think that an application
> intent in calling a method named setIdleTimeout would likely be to
> consider it as the time after which something is considered timed out
> if it is idle. This is what it does now.
> 
> If we called it setOpenFrameIdleTimeout, I think they could definitely
> expect it to set the value populated in the Open frame, before needing
> to read the API to see when things are actually considered timed out.
> 
> If we called it setHeartbeat I think we could resonably expect folks
> going either way on their thinking of what it will do before reading
> the API to find out what it actually does.
> 
> In the end, its simply not the clearest area due to the spec wording,
> which most users will never read, and I think its fine so long as our
> methods/config actually document what they do.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> For additional commands, e-mail: dev-help@qpid.apache.org
> 
> 

-- 
-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

Re: API and terms: idle-time-out and heartbeat intervals.

Posted by Ken Giusti <kg...@redhat.com>.

Robbie beat me to it...

In proton-c pn_transport_set_idle_timeout sets the actual local timeout interval.

The value of the Open frame's idle-timeout field is 1/2 the value set by pn_transport_set_idle_timeout, as suggested by the spec.

:/

https://git-wip-us.apache.org/repos/asf?p=qpid-proton.git;a=blob;f=proton-c/src/transport/transport.c;h=cdecfd291139acfb1c90698677abc0efc4056f50;hb=HEAD#l2539

----- Original Message -----
> From: "Robbie Gemmell" <ro...@gmail.com>
> To: dev@qpid.apache.org
> Cc: users@qpid.apache.org
> Sent: Monday, October 17, 2016 12:32:13 PM
> Subject: Re: API and terms: idle-time-out and heartbeat intervals.
> 
> On 17 October 2016 at 16:57, Alan Conway <ac...@redhat.com> wrote:
> > On Fri, 2016-09-30 at 14:40 +0100, Rob Godfrey wrote:
> >> I don't think it's a bug - it's a completely valid (if chatty)
> >> choice.
> >>
> >> To my mind the semantics of the field are this:
> >>
> >> The sender of the open frame is saying "if I do not receive any data
> >> for this period of time, I reserve the right to assume that you are
> >> no
> >> longer functioning correctly"
> >> On receiving this information, the peer should decide on a strategy
> >> that ensures to the best of its ability, that if it is functioning
> >> normally it will be generating data sufficiently frequently that the
> >> condition does not occur.  If this peer knows that it is susceptible
> >> to delays outside its direct control (such as garbage collection
> >> pauses) it should take account of this in how often it schedules
> >> sending heartbeat frames.
> >> At the same time, the sender of this value should give some leeway
> >> before actually shutting down to account for unexpected transport
> >> delays, or processing delays on its side.  Since it is unaware of the
> >> nature of the allowances made at the peer, it may decide to be
> >> particularly generous in the leeway it grants.
> >>
> >> I think the spec mentioning a ratio (such as twice) is massively
> >> unhelpful.
> >>
> >> In terms of the API - what is the applications intent in setting the
> >> value?  Is the intent to ensure transport activity within a specific
> >> period, or to set a hard limit on how long the library will wait
> >> before generating some sort of timeout exception?
> >
> > My guess is that in terms of the API, the application's intent in
> > calling "set_idle_timeout(T)" is to set the formally-specified AMQP
> > value called "idle-timeout" to T. I don't think anybody's first reading
> > of the API doc or the spec would suggest to them that if they want to
> > set a wire timeout of T they actually need to call
> > set_idle_timeout(T*2) on one side of the connection and it will
> > magically come out as T on the other.
> >
> > Spec wording aside, does anybody actually disagree with that?
> > <snip/>
> 
> If you dont consider the spec wording, I think that an application
> intent in calling a method named setIdleTimeout would likely be to
> consider it as the time after which something is considered timed out
> if it is idle. This is what it does now.
> 
> If we called it setOpenFrameIdleTimeout, I think they could definitely
> expect it to set the value populated in the Open frame, before needing
> to read the API to see when things are actually considered timed out.
> 
> If we called it setHeartbeat I think we could resonably expect folks
> going either way on their thinking of what it will do before reading
> the API to find out what it actually does.
> 
> In the end, its simply not the clearest area due to the spec wording,
> which most users will never read, and I think its fine so long as our
> methods/config actually document what they do.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> For additional commands, e-mail: dev-help@qpid.apache.org
> 
> 

-- 
-K

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: API and terms: idle-time-out and heartbeat intervals.

Posted by Robbie Gemmell <ro...@gmail.com>.

On 17 October 2016 at 16:57, Alan Conway <ac...@redhat.com> wrote:
> On Fri, 2016-09-30 at 14:40 +0100, Rob Godfrey wrote:
>> I don't think it's a bug - it's a completely valid (if chatty)
>> choice.
>>
>> To my mind the semantics of the field are this:
>>
>> The sender of the open frame is saying "if I do not receive any data
>> for this period of time, I reserve the right to assume that you are
>> no
>> longer functioning correctly"
>> On receiving this information, the peer should decide on a strategy
>> that ensures to the best of its ability, that if it is functioning
>> normally it will be generating data sufficiently frequently that the
>> condition does not occur.  If this peer knows that it is susceptible
>> to delays outside its direct control (such as garbage collection
>> pauses) it should take account of this in how often it schedules
>> sending heartbeat frames.
>> At the same time, the sender of this value should give some leeway
>> before actually shutting down to account for unexpected transport
>> delays, or processing delays on its side.  Since it is unaware of the
>> nature of the allowances made at the peer, it may decide to be
>> particularly generous in the leeway it grants.
>>
>> I think the spec mentioning a ratio (such as twice) is massively
>> unhelpful.
>>
>> In terms of the API - what is the applications intent in setting the
>> value?  Is the intent to ensure transport activity within a specific
>> period, or to set a hard limit on how long the library will wait
>> before generating some sort of timeout exception?
>
> My guess is that in terms of the API, the application's intent in
> calling "set_idle_timeout(T)" is to set the formally-specified AMQP
> value called "idle-timeout" to T. I don't think anybody's first reading
> of the API doc or the spec would suggest to them that if they want to
> set a wire timeout of T they actually need to call
> set_idle_timeout(T*2) on one side of the connection and it will
> magically come out as T on the other.
>
> Spec wording aside, does anybody actually disagree with that?
> <snip/>

If you dont consider the spec wording, I think that an application
intent in calling a method named setIdleTimeout would likely be to
consider it as the time after which something is considered timed out
if it is idle. This is what it does now.

If we called it setOpenFrameIdleTimeout, I think they could definitely
expect it to set the value populated in the Open frame, before needing
to read the API to see when things are actually considered timed out.

If we called it setHeartbeat I think we could resonably expect folks
going either way on their thinking of what it will do before reading
the API to find out what it actually does.

In the end, its simply not the clearest area due to the spec wording,
which most users will never read, and I think its fine so long as our
methods/config actually document what they do.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org

Re: API and terms: idle-time-out and heartbeat intervals.

Posted by Robbie Gemmell <ro...@gmail.com>.

On 17 October 2016 at 16:57, Alan Conway <ac...@redhat.com> wrote:
> On Fri, 2016-09-30 at 14:40 +0100, Rob Godfrey wrote:
>> I don't think it's a bug - it's a completely valid (if chatty)
>> choice.
>>
>> To my mind the semantics of the field are this:
>>
>> The sender of the open frame is saying "if I do not receive any data
>> for this period of time, I reserve the right to assume that you are
>> no
>> longer functioning correctly"
>> On receiving this information, the peer should decide on a strategy
>> that ensures to the best of its ability, that if it is functioning
>> normally it will be generating data sufficiently frequently that the
>> condition does not occur.  If this peer knows that it is susceptible
>> to delays outside its direct control (such as garbage collection
>> pauses) it should take account of this in how often it schedules
>> sending heartbeat frames.
>> At the same time, the sender of this value should give some leeway
>> before actually shutting down to account for unexpected transport
>> delays, or processing delays on its side.  Since it is unaware of the
>> nature of the allowances made at the peer, it may decide to be
>> particularly generous in the leeway it grants.
>>
>> I think the spec mentioning a ratio (such as twice) is massively
>> unhelpful.
>>
>> In terms of the API - what is the applications intent in setting the
>> value?  Is the intent to ensure transport activity within a specific
>> period, or to set a hard limit on how long the library will wait
>> before generating some sort of timeout exception?
>
> My guess is that in terms of the API, the application's intent in
> calling "set_idle_timeout(T)" is to set the formally-specified AMQP
> value called "idle-timeout" to T. I don't think anybody's first reading
> of the API doc or the spec would suggest to them that if they want to
> set a wire timeout of T they actually need to call
> set_idle_timeout(T*2) on one side of the connection and it will
> magically come out as T on the other.
>
> Spec wording aside, does anybody actually disagree with that?
> <snip/>

If you dont consider the spec wording, I think that an application
intent in calling a method named setIdleTimeout would likely be to
consider it as the time after which something is considered timed out
if it is idle. This is what it does now.

If we called it setOpenFrameIdleTimeout, I think they could definitely
expect it to set the value populated in the Open frame, before needing
to read the API to see when things are actually considered timed out.

If we called it setHeartbeat I think we could resonably expect folks
going either way on their thinking of what it will do before reading
the API to find out what it actually does.

In the end, its simply not the clearest area due to the spec wording,
which most users will never read, and I think its fine so long as our
methods/config actually document what they do.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org

Re: API and terms: idle-time-out and heartbeat intervals.

Posted by Rob Godfrey <ro...@gmail.com>.

If this API is not also making any decision on the (local) "timeout
threshold" then that seems eminently reasonable (and what anyone with a
passing familiarity of the spec would expect).

If the API is using the same call to also determine when the library will
actually consider the peer has ceased to function, then I would be more
concerned (albeit in an abstract way as I'm not actually currently using
the API at all :-) ).

-- Rob

On 17 October 2016 at 16:57, Alan Conway <ac...@redhat.com> wrote:

> On Fri, 2016-09-30 at 14:40 +0100, Rob Godfrey wrote:
> > I don't think it's a bug - it's a completely valid (if chatty)
> > choice.
> >
> > To my mind the semantics of the field are this:
> >
> > The sender of the open frame is saying "if I do not receive any data
> > for this period of time, I reserve the right to assume that you are
> > no
> > longer functioning correctly"
> > On receiving this information, the peer should decide on a strategy
> > that ensures to the best of its ability, that if it is functioning
> > normally it will be generating data sufficiently frequently that the
> > condition does not occur.  If this peer knows that it is susceptible
> > to delays outside its direct control (such as garbage collection
> > pauses) it should take account of this in how often it schedules
> > sending heartbeat frames.
> > At the same time, the sender of this value should give some leeway
> > before actually shutting down to account for unexpected transport
> > delays, or processing delays on its side.  Since it is unaware of the
> > nature of the allowances made at the peer, it may decide to be
> > particularly generous in the leeway it grants.
> >
> > I think the spec mentioning a ratio (such as twice) is massively
> > unhelpful.
> >
> > In terms of the API - what is the applications intent in setting the
> > value?  Is the intent to ensure transport activity within a specific
> > period, or to set a hard limit on how long the library will wait
> > before generating some sort of timeout exception?
>
> My guess is that in terms of the API, the application's intent in
> calling "set_idle_timeout(T)" is to set the formally-specified AMQP
> value called "idle-timeout" to T. I don't think anybody's first reading
> of the API doc or the spec would suggest to them that if they want to
> set a wire timeout of T they actually need to call
> set_idle_timeout(T*2) on one side of the connection and it will
> magically come out as T on the other.
>
> Spec wording aside, does anybody actually disagree with that?
>
>
> > -- Rob
> >
> >
> > On 30 September 2016 at 14:25, Ken Giusti <kg...@redhat.com> wrote:
> > >
> > > CC'ing user's list
> > >
> > > Given the interpretations of the spec discussed below is the way
> > > Proton handles this functionality inconsistent with itself?
> > >
> > > 1) Proton sends an open frame with _half_ the value of the timeout
> > > as set by the application.
> > > 2) Proton interprets the idle-timeout in the received open as the
> > > actual time out interval, and pessimistically ;) generates
> > > heartbeats as 1/2 that value.
> > >
> > > Is this a bug in the Proton implementation?
> > >
> > > -K
> > >
> > > ----- Original Message -----
> > > >
> > > > From: "Alan Conway" <ac...@redhat.com>
> > > > To: dev@qpid.apache.org
> > > > Sent: Thursday, September 29, 2016 9:11:51 AM
> > > > Subject: Re: API and terms: idle-time-out and heartbeat
> > > > intervals.
> > > >
> > > > On Wed, 2016-09-28 at 22:33 +0100, Robbie Gemmell wrote:
> > > > >
> > > > > The spec unfortunately says what it says, and even if there
> > > > > were
> > > > > scope
> > > > > to update it, I'm not sure there is a way to change it to
> > > > > address the
> > > > > main issue (which regardless of any other clarity issues, is I
> > > > > think
> > > > > just that it only says you SHOULD advertise half your 'actual
> > > > > timeout') that wouldn't result in an 'incompatible change' of
> > > > > behaviour.
> > > >
> > > > I look at this differently. We agree on the semantics of idle-
> > > > time-out
> > > > defined by the spec. Now forget the spec wording, it is easy to
> > > > explain
> > > > clearly what the semantics are. It is not "half your actual time
> > > > out",
> > > > it is the *exact* frame rate you expect from your peer. The
> > > > peer's
> > > > *only* task is to respect that frame rate, they do not need any
> > > > more
> > > > information than that - they certainly don't need to guess at
> > > > what your
> > > > margin for error is for keeping the connection open.
> > > >
> > > > The margin for error is an implementation detail that does *not*
> > > > need
> > > > to be advertised. Double the frame rate is a reasonable general
> > > > recommendation, but the ideal margin depends on latency
> > > > variability in
> > > > the system, you can imagine systems where it might be better to
> > > > have a
> > > > larger or a smaller margin.
> > > >
> > > > The name 'idle-time-out' is not an ideal choice, 'heartbeat' or
> > > > 'max-
> > > > frame-delay' might be better. But once we know what it means, the
> > > > semantics are perfectly clear.
> > > >
> > > > >
> > > > > Knowing the exact enforced timeout could certainly be clearer
> > > > > in some
> > > > > ways, but if the period you had to send after wasnt defined
> > > > > (e.g
> > > > > half)
> > > > > then it would essentially leave you with the same problem as
> > > > > now:
> > > > > deciding exactly how early you should send heartbeat frames to
> > > > > avoid
> > > > > a
> > > > > [more-precisely-known] timeout.
> > > >
> > > > The only important thing is to agree on the meaning of the
> > > > advertised
> > > > value: is it the frame rate or the connection close threshold?
> > > > The work
> > > >  of adjusting at one end or the other is equivalent.
> > > >
> > > > I think we agree it is currently defined as the frame rate, I see
> > > > no
> > > > benefit in trying to change the meaning. (There would be no
> > > > drawback
> > > > either if the spec was not already published and implemented by
> > > > multiple implementors, but it is)
> > > >
> > > > >
> > > > >
> > > > > With what the spec says we effectively have a lower-bound on
> > > > > the
> > > > > actual-timeout (if cases the peer didnt half their actual-
> > > > > timeout
> > > > > before advertising a number) and an upper bound (if they did)
> > > > > due to
> > > > > the use of SHOULD as opposed to MUST or some other more fixed
> > > > > definition of behaviour. So we currently try to send frames
> > > > > often
> > > > > enough to satisfy that lower bound, i.e the number we received.
> > > > > That
> > > > > essentially reduces it to the the same problem as if we were
> > > > > trying
> > > > > to
> > > > > satisfy a theoretical exactly-known actual-timeout (just using
> > > > > a
> > > > > smaller number that we dont want to use, because we followed
> > > > > the
> > > > > specs
> > > > > recommendation). Currently we do this by sending frames after
> > > > > half
> > > > > the
> > > > > advertised period, i.e. what we know is actually a quarter of
> > > > > the
> > > > > actual-timeout enforced by peers such as proton which are
> > > > > following
> > > > > the specs recommendations. We could choose to do it less often
> > > > > than
> > > > > that by reducing how conservative it is, e.g use 75+% of
> > > > > advertised
> > > > > value instead for example, which should have no impact in the
> > > > > cases
> > > > > where peers had followed the recommendation (as we would still
> > > > > be
> > > > > sending more than twice as quickly as needed to satisfy their
> > > > > actual-timeout), but would put us closer to spurious timeout (
> > > > > if e.g
> > > > > there was a delay in delivering/processing the heartbeat)
> > > > > against any
> > > > > peers that didn't follow the recommendation.
> > > > >
> > > > > On 28 September 2016 at 21:26, Justin Ross <justin.ross@gmail.c
> > > > > om>
> > > > > wrote:
> > > > > >
> > > > > >
> > > > > > IMO, the overall picture is simpler, and easier to explain to
> > > > > > third
> > > > > > parties, if we go the way Ken suggested.  When a remote peer
> > > > > > sends
> > > > > > you an
> > > > > > idle timeout value, it is an expression of an (actual, not
> > > > > > simply
> > > > > > "advertised") guarantee - "I will expire the connection after
> > > > > > X
> > > > > > time
> > > > > > without receiving a frame from you".
> > > > > >
> > > > > > We could also legitimately go the direction you
> > > > > > suggest.  *But* its
> > > > > > name is
> > > > > > "idle timeout".  We can't easily change the name.  I think we
> > > > > > should take
> > > > > > the spec text that goes with the name, and the behavior of
> > > > > > our
> > > > > > components,
> > > > > > firmly in the direction Ken suggests.
> > > > > >
> > > > > > Off topic: why is this on the dev list?
> > > > > >
> > > > > >
> > > > > > On Wed, Sep 28, 2016 at 12:36 PM, Alan Conway <aconway@redhat
> > > > > > .com>
> > > > > > wrote:
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, 2016-09-28 at 10:13 -0400, Ken Giusti wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > I've had a hand in the way Proton/C interprets the
> > > > > > > > meaning of
> > > > > > > > 'idle-
> > > > > > > > timeout' and I've never liked the solution.  I think
> > > > > > > > Proton/C's
> > > > > > > > behavior is not 'pessimistic' as much as it is
> > > > > > > > 'conservative'
> > > > > > > > for the
> > > > > > > > sake of interoperability.  This, unfortunately ends up
> > > > > > > > with a
> > > > > > > > needless idle frame chattiness when both ends are Proton-
> > > > > > > > based.
> > > > > > > >
> > > > > > > > ----- Original Message -----
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > From: "Rob Godfrey" <ro...@gmail.com>
> > > > > > > > > To: "qpid" <de...@qpid.apache.org>
> > > > > > > > > Sent: Wednesday, September 28, 2016 6:19:05 AM
> > > > > > > > > Subject: Re: API and terms: idle-time-out and heartbeat
> > > > > > > > > intervals.
> > > > > > > > >
> > > > > > > > > I agree that specifying that the communicated figure
> > > > > > > > > should
> > > > > > > > > be
> > > > > > > > > "half"
> > > > > > > > > the "actual" timeout was a mistake.
> > > > > > > > >
> > > > > > > > > What the spec should have tried to communicate is that
> > > > > > > > > the
> > > > > > > > > sender
> > > > > > > > > should communicate a value somewhat less than the
> > > > > > > > > period it
> > > > > > > > > uses to
> > > > > > > > > determine that the connection has actually timed-out to
> > > > > > > > > allow
> > > > > > > > > for
> > > > > > > > > the
> > > > > > > > > receiver to process and emit a heartbeat frame.
> > > > > > > >
> > > > > > > >
> > > > > > > > Wouldn't it be much clearer to simply send the _actual_
> > > > > > > > idle
> > > > > > > > timeout
> > > > > > > > value?
> > > > > > >
> > > > > > > My read is that is exactly what it does: It sends the max
> > > > > > > time
> > > > > > > that the
> > > > > > > *sender* of frames may be idle. The receiver of frames
> > > > > > > SHOULD be
> > > > > > > more
> > > > > > > patient than that. The wording of the "discussion" around
> > > > > > > it and
> > > > > > > the
> > > > > > > choice of terms is a bit cloudy but, the text that
> > > > > > > describes
> > > > > > > idle-time-
> > > > > > > out seems clear enough: it is the max interval between
> > > > > > > sending
> > > > > > > frames.
> > > > > > > The frame receiver SHOULD wait longer that that before
> > > > > > > closing,
> > > > > > > and 2x
> > > > > > > seems a reasonable suggestion, but that's for the impl to
> > > > > > > decide.
> > > > > > >
> > > > > > > It's weird that it says "idle-time-out should be half the
> > > > > > > threshold"
> > > > > > > instead of "the threshold should be twice the idle-time-
> > > > > > > out" but
> > > > > > > it's
> > > > > > > logically equivalent.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Having the spec suggest "communicating a value *somewhat
> > > > > > > > less*"
> > > > > > >
> > > > > > > The wording is odd but the semantics are you communicate
> > > > > > > *exactly* the
> > > > > > > max frame delay you want and then you SHOULD set your
> > > > > > > connection
> > > > > > > close
> > > > > > > threshold to something bigger. The other end doesn't need
> > > > > > > to know
> > > > > > > how
> > > > > > > much bigger, they just need to know what rate to send
> > > > > > > frames.
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > [emphasis mine] leaves the implementation open for
> > > > > > > > interpretation -
> > > > > > > > which is exactly how we got into this mess in the first
> > > > > > > > place.  Developers are a smart bunch - they know that
> > > > > > > > keep
> > > > > > > > alive
> > > > > > > > traffic will have to be sent frequently enough to prevent
> > > > > > > > idle
> > > > > > > > timeout.
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >  Similarly the sender
> > > > > > > > > should ensure that a frame has been emitted well within
> > > > > > > > > the
> > > > > > > > > timeout
> > > > > > > > > period to allow for any communication / processing
> > > > > > > > > delay.
> > > > > > > >
> > > > > > > > Agreed - perfectly acceptable for the spec to point this
> > > > > > > > out.
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >  In practice
> > > > > > > > > these "wiggle room" factors should not be determined by
> > > > > > > > > the
> > > > > > > > > application level timeout setting but by sensible
> > > > > > > > > calculations on
> > > > > > > > > transport delay variance / processing time,
> > > > > > > > > etc...  these
> > > > > > > > > calculation
> > > > > > > > > may differ between different use-cases / environments
> > > > > > > > > (for
> > > > > > > > > example
> > > > > > > > > in
> > > > > > > > > a low latency / real-time environment you may be able
> > > > > > > > > to make
> > > > > > > > > hard
> > > > > > > > > guarantees about the number of milliseconds that
> > > > > > > > > communication /
> > > > > > > > > processing delay will take... on the other hand if you
> > > > > > > > > are
> > > > > > > > > using an
> > > > > > > > > interpreted language with stop-the-world garbage
> > > > > > > > > collection
> > > > > > > > > you may
> > > > > > > > > not be able to say much better than the delay should be
> > > > > > > > > less
> > > > > > > > > than
> > > > > > > > > 30s
> > > > > > > > > or whatever).
> > > > > > > > >
> > > > > > > >
> > > > > > > > Yes - very important things to keep in mind when
> > > > > > > > implementing
> > > > > > > > this.  But the spec shouldn't be making these suggestions
> > > > > > > > for
> > > > > > > > different implementation options. The spec should be as
> > > > > > > > concise
> > > > > > > > as
> > > > > > > > possible about the mandated behavior, and leave the
> > > > > > > > implementation to
> > > > > > > > the developers.
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I think application level APIs should be in terms of
> > > > > > > > > the
> > > > > > > > > timeouts
> > > > > > > > > that
> > > > > > > > > will affect the application.  The AMQP library should
> > > > > > > > > be
> > > > > > > > > massaging
> > > > > > > > > those numbers in such a way that they can fulfil the
> > > > > > > > > application
> > > > > > > > > requirements.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Agreed.  Now, is there _any_ way we can suggest an update
> > > > > > > > to
> > > > > > > > the
> > > > > > > > spec?  Perhaps an errata, etc?
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > -- Rob
> > > > > > > > >
> > > > > > > > > On 28 September 2016 at 10:42, Robbie Gemmell
> > > > > > > > > <robbie.gemmell
> > > > > > > > > @gmail
> > > > > > > > > .com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On 27 September 2016 at 22:24, Alan Conway <aconway@r
> > > > > > > > > > edhat.
> > > > > > > > > > com>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Tue, 2016-09-27 at 15:37 -0400, Alan Conway
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I want to clarify and document the meaning of
> > > > > > > > > > > > these
> > > > > > > > > > > > terms for
> > > > > > > > > > > > our
> > > > > > > > > > > > APIs,
> > > > > > > > > > > > presently I can't find anywhere where they are
> > > > > > > > > > > > documented
> > > > > > > > > > > > clearly.
> > > > > > > > > > > >
> > > > > > > > > > > > The AMQP spec says: "Each peer has its own
> > > > > > > > > > > > (independent) idle
> > > > > > > > > > > > timeout.
> > > > > > > > > > > > At connection open each peer communicates the
> > > > > > > > > > > > maximum
> > > > > > > > > > > > period between activity (frames) on the
> > > > > > > > > > > > connection that
> > > > > > > > > > > > it
> > > > > > > > > > > > desires
> > > > > > > > > > > > from
> > > > > > > > > > > > its partner.The open frame carries the idletime-
> > > > > > > > > > > > out
> > > > > > > > > > > > field for this purpose. To avoid spurious
> > > > > > > > > > > > timeouts, the
> > > > > > > > > > > > value
> > > > > > > > > > > > in
> > > > > > > > > > > > idle-
> > > > > > > > > > > > time-out SHOULD be half the peer’s
> > > > > > > > > > > > actual timeout threshold."
> > > > > > > > > > > >
> > > > > > > > > > > > In other words: if I send you an "open" frame
> > > > > > > > > > > > with
> > > > > > > > > > > > idle-time-
> > > > > > > > > > > > out=N
> > > > > > > > > > > > that
> > > > > > > > > > > > means *you* should not wait for longer than N
> > > > > > > > > > > > milliseconds to
> > > > > > > > > > > > send a
> > > > > > > > > > > > frame to me. It does not mean *I* will close the
> > > > > > > > > > > > connection
> > > > > > > > > > > > after N
> > > > > > > > > > > > milliseconds, I SHOULD be more patient and wait
> > > > > > > > > > > > for N*2
> > > > > > > > > > > > ms to
> > > > > > > > > > > > avoid
> > > > > > > > > > > > closing prematurely due to minor timing wobbles.
> > > > > > > > > > > >
> > > > > > > > > > > > I think the choice of name is slightly ambiguous
> > > > > > > > > > > > but
> > > > > > > > > > > > the spec
> > > > > > > > > > > > is
> > > > > > > > > > > > clear
> > > > > > > > > > > > on the semantics, so it's important to document
> > > > > > > > > > > > it to
> > > > > > > > > > > > remove
> > > > > > > > > > > > the
> > > > > > > > > > > > ambiguity.
> > > > > > > > > > > >
> > > > > > > > > > > > Anybody disagree?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Sigh. Sadly proton-C interprets "idle-timeout"
> > > > > > > > > > > differently
> > > > > > > > > > > depending on
> > > > > > > > > > > which end of the connection you are on:
> > > > > > > > > > >
> > > > > > > > > > >       // as per the recommendation in the spec,
> > > > > > > > > > > advertise
> > > > > > > > > > > half
> > > > > > > > > > > our
> > > > > > > > > > >       // actual timeout to the remote
> > > > > > > > > > >       const pn_millis_t idle_timeout = transport-
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > local_idle_timeout
> > > > > > > > > > >           ? (transport->local_idle_timeout/2)
> > > > > > > > > > >           : 0;
> > > > > > > > > > >
> > > > > > > > > > > So in proton, pn_set_idle_timeout does NOT mean set
> > > > > > > > > > > the
> > > > > > > > > > > AMQP
> > > > > > > > > > > idle-
> > > > > > > > > > > timeout value, it means set the local "receive
> > > > > > > > > > > timeout"
> > > > > > > > > > > value
> > > > > > > > > > > and send
> > > > > > > > > > > half that as the AMQP "send timeout" for the peer.
> > > > > > > > > > >
> > > > > > > > > > > I'm tempted to use a new term in the Go API:
> > > > > > > > > > > "heartbeat".
> > > > > > > > > > > To me
> > > > > > > > > > > that
> > > > > > > > > > > clearly means the "send timeout" (hearts beat, they
> > > > > > > > > > > don't
> > > > > > > > > > > listen for
> > > > > > > > > > > beats) so it coincides with the meaning of the AMQP
> > > > > > > > > > > "idle-
> > > > > > > > > > > timeout", but
> > > > > > > > > > > without the ambiguity that is exacerbated by proton
> > > > > > > > > > > interpreting it
> > > > > > > > > > > both ways.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Proton may seem to behave differently on each end,
> > > > > > > > > > but I
> > > > > > > > > > don't
> > > > > > > > > > think
> > > > > > > > > > its necessarily a bad thing that it does, and it is
> > > > > > > > > > also I
> > > > > > > > > > think
> > > > > > > > > > largely just reflecting an annoying bit in the spec
> > > > > > > > > > around
> > > > > > > > > > this
> > > > > > > > > > where
> > > > > > > > > > different behaviours are allowed for, whereas it
> > > > > > > > > > would be
> > > > > > > > > > easier
> > > > > > > > > > if it
> > > > > > > > > > had less wiggle room.
> > > > > > > > > >
> > > > > > > > > > The transport setter/getter for the local timeout
> > > > > > > > > > takes the
> > > > > > > > > > 'actual
> > > > > > > > > > timeout' and then sends half of it as the advertised
> > > > > > > > > > value
> > > > > > > > > > in the
> > > > > > > > > > Open
> > > > > > > > > > sent. This makes a certain amount of sense since it
> > > > > > > > > > ensures
> > > > > > > > > > that
> > > > > > > > > > appropriate behaviour is actually satisfied, rather
> > > > > > > > > > than
> > > > > > > > > > expecting the
> > > > > > > > > > user to ensure they only give half the value they
> > > > > > > > > > really
> > > > > > > > > > want for
> > > > > > > > > > their actual timeout. The getter for the remote
> > > > > > > > > > timeout
> > > > > > > > > > value on
> > > > > > > > > > the
> > > > > > > > > > other hand returns the advertised value from the Open
> > > > > > > > > > that
> > > > > > > > > > is
> > > > > > > > > > received. I expect it does that since it cant
> > > > > > > > > > actually ever
> > > > > > > > > > return the
> > > > > > > > > > remotes 'actual timeout' without making an
> > > > > > > > > > assumption, i.e
> > > > > > > > > > that
> > > > > > > > > > they
> > > > > > > > > > did in fact advertise half (or less) of their actual
> > > > > > > > > > timeout,
> > > > > > > > > > which
> > > > > > > > > > the spec only says that they SHOULD do.
> > > > > > > > > >
> > > > > > > > > > Yes the local setter taking the advertised value may
> > > > > > > > > > have
> > > > > > > > > > been
> > > > > > > > > > better
> > > > > > > > > > for method consistency with the remote getter. On the
> > > > > > > > > > other
> > > > > > > > > > hand,
> > > > > > > > > > sending of necessary heartbeats is handled directly
> > > > > > > > > > by the
> > > > > > > > > > transport
> > > > > > > > > > during the tick process, so users may not necessarily
> > > > > > > > > > even
> > > > > > > > > > use
> > > > > > > > > > the
> > > > > > > > > > getter themselves, and proton uses that remote value
> > > > > > > > > > internally
> > > > > > > > > > by
> > > > > > > > > > pessimistically halfing it to account for the case
> > > > > > > > > > that
> > > > > > > > > > folks on
> > > > > > > > > > the
> > > > > > > > > > other end did not advertise half their actual timeout
> > > > > > > > > > (since the
> > > > > > > > > > spec
> > > > > > > > > > doesnt require that they do). Side note: proton could
> > > > > > > > > > arguably be
> > > > > > > > > > less
> > > > > > > > > > pessimistic here and go for say a percentage much
> > > > > > > > > > nearer
> > > > > > > > > > the full
> > > > > > > > > > advertised value, but then you'd probably need to
> > > > > > > > > > start
> > > > > > > > > > guaging
> > > > > > > > > > how
> > > > > > > > > > close is too close.
> > > > > > > > > >
> > > > > > > > > > I think ensuring the doccumentation on the methods is
> > > > > > > > > > clear
> > > > > > > > > > what
> > > > > > > > > > they
> > > > > > > > > > do is sufficient enough here. I actually prefer idle-
> > > > > > > > > > timeout as
> > > > > > > > > > an
> > > > > > > > > > name rather than heartbeat due to the way this all
> > > > > > > > > > works.
> > > > > > > > > > Since
> > > > > > > > > > you
> > > > > > > > > > only tell the other side [half] your timeout, you
> > > > > > > > > > dont
> > > > > > > > > > actually
> > > > > > > > > > have
> > > > > > > > > > direct control over when they send any needed empty
> > > > > > > > > > frames
> > > > > > > > > > to
> > > > > > > > > > satisfy
> > > > > > > > > > it (as the above shows, we might send them more often
> > > > > > > > > > than
> > > > > > > > > > they
> > > > > > > > > > require) and 'heartbeat' might seem to imply that you
> > > > > > > > > > do,
> > > > > > > > > > and
> > > > > > > > > > possibly
> > > > > > > > > > even that they need be sent at that period all the
> > > > > > > > > > time
> > > > > > > > > > even
> > > > > > > > > > despite
> > > > > > > > > > regular traffic, which is not the case.
> > > > > > > > > >
> > > > > > > > > > Robbie
> > > > > > > > > >
> > > > > > > > > > ---------------------------------------------------
> > > > > > > > > > ------
> > > > > > > > > > ------
> > > > > > > > > > ------
> > > > > > > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.o
> > > > > > > > > > rg
> > > > > > > > > > For additional commands, e-mail: dev-help@qpid.apache
> > > > > > > > > > .org
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > -----------------------------------------------------
> > > > > > > > > ------
> > > > > > > > > ------
> > > > > > > > > ----
> > > > > > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> > > > > > > > > For additional commands, e-mail: dev-help@qpid.apache.o
> > > > > > > > > rg
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > ---------------------------------------------------------
> > > > > > > ------
> > > > > > > ------
> > > > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> > > > > > > For additional commands, e-mail: dev-help@qpid.apache.org
> > > > > > >
> > > > > > >
> > > > >
> > > > > -------------------------------------------------------------
> > > > > --------
> > > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> > > > > For additional commands, e-mail: dev-help@qpid.apache.org
> > > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------
> > > > ------
> > > > To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
> > > > For additional commands, e-mail: dev-help@qpid.apache.org
> > > >
> > > >
> > >
> > > --
> > > -K
> > >
> > > -----------------------------------------------------------------
> > > ----
> > > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> > > For additional commands, e-mail: users-help@qpid.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> > For additional commands, e-mail: users-help@qpid.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>
>