You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@openwhisk.apache.org by Markus Thoemmes <ma...@de.ibm.com> on 2018/07/03 13:47:33 UTC

Connection reuse in the Invoker (HttpUtils)

Hi OpenWhiskers,

I (and potentially Tyson in https://github.com/apache/incubator-openwhisk/issues/3151) found a pretty nasty bug which is very spurious, but hear me out:

Today, we employ a connection pool when talking to our user-containers. The assumption is that connections are reused between requests, a performance improvement. If a container is not used for some time (aka pause-grace) we pause it, effectively freezing the container. The container (and its socket) will not answer to any requests coming from the outside for the time of being paused.

When that container is to be used again, we resume it and then try to reuse one of the connections I talked about earlier. These connections can go stale though. The server might close them before further notice and my expectation is, that pausing/resuming adds another layer of uncertainty on the state of the connection. The PoolingConnectionManager does staleness checks after a connection has been idle for 2 seconds. This staleness checking involves writing into the socket and observing what happens. Per documentation (https://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/apache/http/client/config/RequestConfig.html), it's an expensive procedure that can take up to 30ms, which in my opinion, completely negates the performance optimization it's geared towards. After all, we might not be reusing many connection anyway, if they go stale pretty quick due to pause/resume.

Furthermore, there is this problem (per: https://hc.apache.org/httpcomponents-client-ga/tutorial/html/connmgmt.html):

> HttpClient tries to mitigate the problem by testing whether the connection is 'stale', that is no longer valid because it was closed on the server side, 
> prior to using the connection for executing an HTTP request. The stale connection check is not 100% reliable. The only feasible solution that does not 
> involve a one thread per socket model for idle connections is a dedicated monitor thread used to evict connections that are considered expired due to a 
> long period of inactivity. The monitor thread can periodically call ClientConnectionManager#closeExpiredConnections() method to close all expired connections 
> and evict closed connections from the pool. It can also optionally call ClientConnectionManager#closeIdleConnections() method to close all connections that 
> have been idle over a given period of time.

I think in our case this is a bit overengineered and propose the following:

We throw away all connections in the pool upon pausing the container. A connection will be reestablished after resuming it. This will reuse connections under bursts (in the pause-grace) but should safely recreate connections when pausing/unpausing. As I said above, this might be pretty close the behavior we're having anyway, but makes it less prone to timing errors and much more explicit.

This might well explain the strange issues we had when we first used akka-http in our invoker -> user-container communication. @Tyson this would then have an impact on the work you're doing.

Let me know what you think,
-m



Re: Connection reuse in the Invoker (HttpUtils)

Posted by Michele Sciabarra <mi...@sciabarra.com>.
Did you try the new Go proxy? How could I test persistent
connections handling?--
  Michele Sciabarra
  michele@sciabarra.com



On Wed, Jul 4, 2018, at 8:35 PM, Markus Thoemmes wrote:
> Hi all,
>
> fixing the subtle bugs in HttpUtils has lead to quite an odyssey. We
> needed to roll back the changes mostly because of latency regressions> being caught by the tests in the main repository.
>
> These regressions seemed to have been applying only to runtimes !=
> node.js, which caused me to investigate into those. My theory is that> neither our Python proxy (used by Swift etc. as well) nor the
> Java proxy> properly support persistent connections today. Handling requests
> properly on the client-side (closing the entity stream) turned out to> add a whole lot of latency only for those runtimes (10-20m per
> request).>
> I then went ahead and disabled connection reuse in total (it shouldn't> be working correctly today anyway, due to the missing closes). That
> brought the latency back to normal through all runtimes.
>
> The good sideeffect of that is, that resolves any issues of checking
> connections for staleness and/or them breaking because of the pause/
> resume lifecycle discussed in this thread. The added latency of always> establishing a new connection were not measurable in the
> latency tests.> They are probably <= 1ms. Compared to the added benefit of
> always being> "correct" and enabling us to implement the other bugfixes needed in
> HttpUtils (properly closing responses and entities), I feel like we
> should disable connection reuse until we find that to be a bottleneck> that warrants the added complexity of handling their lifecycle
> properly> (as discussed here).
>
> The new implementation is to be found here:
> https://github.com/apache/incubator-openwhisk/pull/3843
>
> Cheers
> -mt
>


Re: Connection reuse in the Invoker (HttpUtils)

Posted by Markus Thoemmes <ma...@de.ibm.com>.
Hi all,

fixing the subtle bugs in HttpUtils has lead to quite an odyssey. We needed to roll back the changes mostly because of latency regressions being caught by the tests in the main repository.

These regressions seemed to have been applying only to runtimes != node.js, which caused me to investigate into those. My theory is that neither our Python proxy (used by Swift etc. as well) nor the Java proxy properly support persistent connections today. Handling requests properly on the client-side (closing the entity stream) turned out to add a whole lot of latency only for those runtimes (10-20m per request).

I then went ahead and disabled connection reuse in total (it shouldn't be working correctly today anyway, due to the missing closes). That brought the latency back to normal through all runtimes.

The good sideeffect of that is, that resolves any issues of checking connections for staleness and/or them breaking because of the pause/resume lifecycle discussed in this thread. The added latency of always establishing a new connection were not measurable in the latency tests. They are probably <= 1ms. Compared to the added benefit of always being "correct" and enabling us to implement the other bugfixes needed in HttpUtils (properly closing responses and entities), I feel like we should disable connection reuse until we find that to be a bottleneck that warrants the added complexity of handling their lifecycle properly (as discussed here).

The new implementation is to be found here: https://github.com/apache/incubator-openwhisk/pull/3843

Cheers
-mt