You are viewing a plain text version of this content. The canonical link for it is here.

Posted to proton@qpid.apache.org by Dominic Evans <do...@uk.ibm.com> on 2015/04/01 12:00:56 UTC

Idle Timeout of a Connection

2.4.5 Idle Timeout Of A Connection

"To avoid spurious timeouts, the value in idle-time-out SHOULD be half the peer's actual timeout threshold" 

So, to me, this means on the @open performative the client should flow (e.g.,) 30000 as the idleTimeOut it would like to negotiate, but should actually only enforce that data is received from the other end within 60000 milliseconds before it closes the session+connection.

However, if that is the case, then the code in proton-c (pn_tick_amqp in transport.c) and proton-j (#tick() in TransportImpl.java) would appear to be doing the wrong thing?
Currently it *halves* the advertised remote_idle_timeout of the peer in order to determine what deadline to adhere to for sending empty keepalive frames to the remote end.
Similarly it uses its local_idle_timeout as-is to determine if the remote end hasn't send data recently enough (closing the link with resource-limit-exceeded when the deadline elapses). This would seem to mean that empty frames are being sent twice as often as they need to be, and resource-limit-exceeded is being fired too soon.

It seems to me that instead it should used remote_idle_timeout as-is for determining the deadline for sending data, and the local_idle_timeout specified by the client user should either be doubled when determining the deadline or halved before sending it in the @open frame.

Thoughts?

Cheers,
Dom
-- 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: Idle Timeout of a Connection

Posted by Ken Giusti <kg...@redhat.com>.


----- Original Message -----
> From: "Dominic Evans" <do...@uk.ibm.com>
> To: proton@qpid.apache.org
> Sent: Wednesday, April 1, 2015 1:55:08 PM
> Subject: Re: Idle Timeout of a Connection
> 
> -----Ken Giusti <kg...@redhat.com> wrote: -----
> > I've gone back and forth about what the proper behavior should be for
> > Proton re: idle timeout.
> > 
> > It's that darn pesky "SHOULD"... which means 'recommended', not
> > exactly 'required'.
> 
> Yes, I was similarly surprised that the spec chose to say 'SHOULD' rather
> than
> explicitly stating exactly what the enforcements should be.
> 
> > So the current impl takes the conservative approach and assumes the
> > peer may have advertised the actual timeout (e.g. not half).
> > 
> > To prevent spurious timeouts, idle frames are sent (if necessary) at
> > (timeout)/2 since the last transmitted frame. Sending at exactly
> > (timeout) risks missing the peer's timeout if they did not advertise
> > half their actual timeout. This allows proton to be liberal in
> > respects to how the peer may have implemented idle time, at the
> > expense of potentially doubling the idle frame transmission rate. I
> > feel this is a justifiable tradeoff in an effort to keep connections
> > up in the face of different interpretations of the spec.
> 
> So I'm reasonably happy to leave this as-is. Yes, it does mean we are
> potentially sending empty frames twice as often as we need to, but that's
> never
> going to break the bank and it gives us the security that we will rarely ever
> lose a remote client due to idle timeout.
> 

In all fairness, it could be argued that implementations that do not advertise 1/2 their timeout value should be fixed.   In that case, changing proton to generate idle frames at the advertised timeout interval will help uncover these problems.



> > As far as the local setting is concerned, the API doesn't indicate
> > that the value supplied by the application should be half the actual
> > timeout. In other words, users of the API should expect their
> > specified timeout to be used as given, not doubled (unless we expect
> > users of the API to be experts on the standard, which I didn't think
> > was the case).
> 
> I'm less inclined to agree on this one. If we are going for the conservative
> approach in the sending of keepalive frames from proton, I think we should
> also
> go for the conservative approach when enforcing the receipt of keepalive
> frames before closing the session. If I were to write an AMQP 1.0 broker
> based
> upon the spec, I could quite reasonably assume that I only need to send those
> frames as often as the remote end requested them in the @open.
> 
> > Currently, proton does advertise 1/2 of this value, so proton is in fact
> > following the recommendation in the spec.
> 
> Aha, I hadn't spotted that in proton-c's transport.c
> 
> // as per the recommendation in the spec, advertise half our
> // actual timeout to the remote
> const pn_millis_t idle_timeout = transport->local_idle_timeout
> ? (transport->local_idle_timeout/2)
> : 0;
> 
> Currently proton-j just advertises the localIdleTimeout as-is. So perhaps the
> immediate short term fix here is to also have proton-j advertise half the
> supplied localIdleTimeout value to at least make it match what proton-c does.
> 

Ah, so it's possible that having the C impl send twice as often actually hid this discrepancy? 

> 
> Cheers,
> Dom
> 
> --
> 
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> 
> 

-- 
-K

Re: Idle Timeout of a Connection

Posted by Dominic Evans <do...@uk.ibm.com>.

-----Ken Giusti <kg...@redhat.com> wrote: -----
> I've gone back and forth about what the proper behavior should be for
> Proton re: idle timeout.
> 
> It's that darn pesky "SHOULD"... which means 'recommended', not
> exactly 'required'.

Yes, I was similarly surprised that the spec chose to say 'SHOULD' rather than
explicitly stating exactly what the enforcements should be.

> So the current impl takes the conservative approach and assumes the
> peer may have advertised the actual timeout (e.g. not half).
> 
> To prevent spurious timeouts, idle frames are sent (if necessary) at
> (timeout)/2 since the last transmitted frame. Sending at exactly
> (timeout) risks missing the peer's timeout if they did not advertise
> half their actual timeout. This allows proton to be liberal in
> respects to how the peer may have implemented idle time, at the
> expense of potentially doubling the idle frame transmission rate. I
> feel this is a justifiable tradeoff in an effort to keep connections
> up in the face of different interpretations of the spec.

So I'm reasonably happy to leave this as-is. Yes, it does mean we are
potentially sending empty frames twice as often as we need to, but that's never
going to break the bank and it gives us the security that we will rarely ever
lose a remote client due to idle timeout.

> As far as the local setting is concerned, the API doesn't indicate
> that the value supplied by the application should be half the actual
> timeout. In other words, users of the API should expect their
> specified timeout to be used as given, not doubled (unless we expect
> users of the API to be experts on the standard, which I didn't think
> was the case).

I'm less inclined to agree on this one. If we are going for the conservative
approach in the sending of keepalive frames from proton, I think we should also
go for the conservative approach when enforcing the receipt of keepalive
frames before closing the session. If I were to write an AMQP 1.0 broker based
upon the spec, I could quite reasonably assume that I only need to send those
frames as often as the remote end requested them in the @open.

> Currently, proton does advertise 1/2 of this value, so proton is in fact
> following the recommendation in the spec.

Aha, I hadn't spotted that in proton-c's transport.c

// as per the recommendation in the spec, advertise half our
// actual timeout to the remote
const pn_millis_t idle_timeout = transport->local_idle_timeout
? (transport->local_idle_timeout/2)
: 0;

Currently proton-j just advertises the localIdleTimeout as-is. So perhaps the
immediate short term fix here is to also have proton-j advertise half the
supplied localIdleTimeout value to at least make it match what proton-c does.

Cheers,
Dom

-- 

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: Idle Timeout of a Connection

Posted by Ken Giusti <kg...@redhat.com>.

Hi,

I've gone back and forth about what the proper behavior should be for Proton re: idle timeout.

It's that darn pesky "SHOULD"...  which means 'recommended', not exactly 'required'.

So the current impl takes the conservative approach and assumes the peer may have advertised the actual timeout (e.g. not half).

To prevent spurious timeouts, idle frames are sent (if necessary) at (timeout)/2 since the last transmitted frame.  Sending at exactly (timeout) risks missing the peer's timeout if they did not advertise half their actual timeout.  This allows proton to be liberal in respects to how the peer may have implemented idle time, at the expense of potentially doubling the idle frame transmission rate.  I feel this is a justifiable tradeoff in an effort to keep connections up in the face of different interpretations of the spec.

As far as the local setting is concerned, the API doesn't indicate that the value supplied by the application should be half the actual timeout.   In other words, users of the API should expect their specified timeout to be used as given, not doubled (unless we expect users of the API to be experts on the standard, which I didn't think was the case).   Currently, proton does advertise 1/2 of this value, so proton is in fact following the recommendation in the spec.

-K

----- Original Message -----
> From: "Dominic Evans" <do...@uk.ibm.com>
> To: proton@qpid.apache.org
> Sent: Wednesday, April 1, 2015 7:46:06 AM
> Subject: Re: Idle Timeout of a Connection
> 
> 
> -----Rafael Schloming <rh...@alum.mit.edu> wrote: -----
> 
> > On Wed, Apr 1, 2015 at 6:00 AM, Dominic Evans <do...@uk.ibm.com>
> > wrote:
> >> 2.4.5 Idle Timeout Of A Connection
> >>
> >> Thoughts?
> >>
> >
> > I believe your interpretation is correct. I've certainly noticed idle
> > frames
> > being sent significantly more often than I would have expected, but I
> > haven't
> > had time to dig into the cause.
> 
> OK good. I'll put together the changes and submit them via a JIRA + PR.
> 
> Cheers,
> Dom
> 
> --
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
> 
> 

-- 
-K

Re: Idle Timeout of a Connection

Posted by Dominic Evans <do...@uk.ibm.com>.

-----Rafael Schloming <rh...@alum.mit.edu> wrote: -----

> On Wed, Apr 1, 2015 at 6:00 AM, Dominic Evans <do...@uk.ibm.com> wrote:
>> 2.4.5 Idle Timeout Of A Connection
>>
>> Thoughts?
>>
>
> I believe your interpretation is correct. I've certainly noticed idle frames
> being sent significantly more often than I would have expected, but I haven't
> had time to dig into the cause.

OK good. I'll put together the changes and submit them via a JIRA + PR.

Cheers,
Dom

-- 
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Re: Idle Timeout of a Connection

Posted by Rafael Schloming <rh...@alum.mit.edu>.

On Wed, Apr 1, 2015 at 6:00 AM, Dominic Evans <do...@uk.ibm.com>
wrote:

> 2.4.5 Idle Timeout Of A Connection
>
> "To avoid spurious timeouts, the value in idle-time-out SHOULD be half the
> peer's actual timeout threshold"
>
> So, to me, this means on the @open performative the client should flow
> (e.g.,) 30000 as the idleTimeOut it would like to negotiate, but should
> actually only enforce that data is received from the other end within 60000
> milliseconds before it closes the session+connection.
>
> However, if that is the case, then the code in proton-c (pn_tick_amqp in
> transport.c) and proton-j (#tick() in TransportImpl.java) would appear to
> be doing the wrong thing?
> Currently it *halves* the advertised remote_idle_timeout of the peer in
> order to determine what deadline to adhere to for sending empty keepalive
> frames to the remote end.
> Similarly it uses its local_idle_timeout as-is to determine if the remote
> end hasn't send data recently enough (closing the link with
> resource-limit-exceeded when the deadline elapses). This would seem to mean
> that empty frames are being sent twice as often as they need to be, and
> resource-limit-exceeded is being fired too soon.
>
> It seems to me that instead it should used remote_idle_timeout as-is for
> determining the deadline for sending data, and the local_idle_timeout
> specified by the client user should either be doubled when determining the
> deadline or halved before sending it in the @open frame.
>
> Thoughts?
>

I believe your interpretation is correct. I've certainly noticed idle
frames being sent significantly more often than I would have expected, but
I haven't had time to dig into the cause.

--Rafael