You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by "Plüm, Rüdiger, VF-Group" <ru...@vodafone.com> on 2008/09/05 16:21:49 UTC

Behaviour of mod_proxy_ajp if CPING/CPONG fails

IMHO the current behaviour of mod_proxy_ajp is not correct if a CPING/CPONG fails
on a backend connection. In this case the status is set to HTTP_SERVICE_UNAVAILABLE
and the scheme handler returns. In the case of an unbalanced backend this results
in a HTTP_SERVICE_UNAVAILABLE (503) to be returned to the client. But a failing
CPING/CPONG can be caused by a faulty AJP connection that was closed by the backend
or a race condition in sending the CPING and closing the connection on the backend.
So after the first failed CPING/CPONG we should try again with a new TCP connection
and should only return HTTP_SERVICE_UNAVAILABLE if this fails as well. The following
(and attached patch) does exactly that. Thoughs to the above and/or to the patch
before I commit?


Index: modules/proxy/mod_proxy_ajp.c
===================================================================
--- modules/proxy/mod_proxy_ajp.c       (revision 692409)
+++ modules/proxy/mod_proxy_ajp.c       (working copy)
@@ -554,6 +554,7 @@
     conn_rec *origin = NULL;
     proxy_conn_rec *backend = NULL;
     const char *scheme = "AJP";
+    int retry;
     proxy_dir_conf *dconf = ap_get_module_config(r->per_dir_config,
                                                  &proxy_module);

@@ -597,43 +598,53 @@
     backend->is_ssl = 0;
     backend->close = 0;

-    /* Step One: Determine Who To Connect To */
-    status = ap_proxy_determine_connection(p, r, conf, worker, backend,
-                                           uri, &url, proxyname, proxyport,
-                                           server_portstr,
-                                           sizeof(server_portstr));
+    retry = 0;
+    while (retry < 2) {
+        /* Step One: Determine Who To Connect To */
+        status = ap_proxy_determine_connection(p, r, conf, worker, backend,
+                                               uri, &url, proxyname, proxyport,
+                                               server_portstr,
+                                               sizeof(server_portstr));

-    if (status != OK)
-        goto cleanup;
+        if (status != OK)
+            break;

-    /* Step Two: Make the Connection */
-    if (ap_proxy_connect_backend(scheme, backend, worker, r->server)) {
-        ap_log_error(APLOG_MARK, APLOG_ERR, 0, r->server,
-                     "proxy: AJP: failed to make connection to backend: %s",
-                     backend->hostname);
-        status = HTTP_SERVICE_UNAVAILABLE;
-        goto cleanup;
-    }
-
-    /* Handle CPING/CPONG */
-    if (worker->ping_timeout_set) {
-        status = ajp_handle_cping_cpong(backend->sock, r,
-                                        worker->ping_timeout);
-        if (status != APR_SUCCESS) {
-            backend->close++;
-            ap_log_error(APLOG_MARK, APLOG_ERR, status, r->server,
-                         "proxy: AJP: cping/cpong failed to %pI (%s)",
-                         worker->cp->addr,
-                         worker->hostname);
+        /* Step Two: Make the Connection */
+        if (ap_proxy_connect_backend(scheme, backend, worker, r->server)) {
+            ap_log_error(APLOG_MARK, APLOG_ERR, 0, r->server,
+                         "proxy: AJP: failed to make connection to backend: %s",
+                         backend->hostname);
             status = HTTP_SERVICE_UNAVAILABLE;
-            goto cleanup;
+            break;
         }
+
+        /* Handle CPING/CPONG */
+        if (worker->ping_timeout_set) {
+            status = ajp_handle_cping_cpong(backend->sock, r,
+                                            worker->ping_timeout);
+            /*
+             * In case the CPING / CPONG failed for the first time we might be
+             * just out of luck and got a faulty backend connection, but the
+             * backend might be healthy nevertheless. So ensure that the backend
+             * TCP connection gets closed and try it once again.
+             */
+            if (status != APR_SUCCESS) {
+                backend->close++;
+                ap_log_error(APLOG_MARK, APLOG_ERR, status, r->server,
+                             "proxy: AJP: cping/cpong failed to %pI (%s)",
+                             worker->cp->addr,
+                             worker->hostname);
+                status = HTTP_SERVICE_UNAVAILABLE;
+                retry++;
+                continue;
+            }
+        }
+        /* Step Three: Process the Request */
+        status = ap_proxy_ajp_request(p, r, backend, origin, dconf, uri, url,
+                                      server_portstr);
+        break;
     }
-    /* Step Three: Process the Request */
-    status = ap_proxy_ajp_request(p, r, backend, origin, dconf, uri, url,
-                                  server_portstr);

-cleanup:
     /* Do not close the socket */
     ap_proxy_release_connection(scheme, backend, r->server);
     return status;

Regards

Rüdiger

Re: Behaviour of mod_proxy_ajp if CPING/CPONG fails

Posted by Jim Jagielski <ji...@jaguNET.com>.

IMO, the suggested patch does what it's supposed to do and doesn't
cause any regressions. As such, +1 for folding into trunk.

Re: Behaviour of mod_proxy_ajp if CPING/CPONG fails

Posted by Ruediger Pluem <rp...@apache.org>.

On 09/07/2008 12:43 PM, Rainer Jung wrote:
> Ruediger Pluem schrieb:
>>
>> On 09/06/2008 10:54 PM, Rainer Jung wrote:
>>> Rüdiger Plüm schrieb:

>>
>>> But in case the user of the connection knows, that it's broken, and
>>> closes it, it would make more sense to put in under the stack, since it
>>> is no longer connected.
>> Why? IMHO this defers only the efforts that need to be done anyway to
>> create
>> a new TCP connection. And keep in mind that even in the case that I put the
>> faulty connection back *under* the stack the next connection on top of the
>> stack was even longer idle then the one that was faulty. So it is likely to
>> be faulty as well. It might be the case though that this faultiness is
>> detected earlier (in the TCP connection check) and thus our next
>> CPING/CPONG
>> in the loop happens with a fine fresh TCP connection.
> 
> Yes, I think that's the question: if one CPING fails, what do you want
> to do with the remaining connections in the pool?
> 
> - do you assume they are broken as well and close them directly (and
> afterwards try to open a new one)
> - do you want to test them immediately (and if one of them is still OK
> maybe end the testing and use it)
> - you don't care and simply try to open a new one.
> 
> Concerning the argument that the next connection on the step would be
> even longer idle: yes, unless another thread returned one to the pool in
> the meantime.

Of course. This could have happened, but as this connection needs to be fixed
sooner or later anyway I think I should do it now.
I guess this is a matter of assumption and probability:
If you assume that a healthy connection was put back in the reslist by another
thread and that the broken connection won't be used in the near future
anyway, then the approach of getting another one from the reslist makes sense.
If you assume that you get the broken one back anyway and that it will be needed
in the near future the other approach makes sense.
As you notice I am leaning to assuming the second one :-).

> 
> My personal opinion (based on mod_jk's connection handling code): CPING
> failure is very rare. In most cases, it indicates connection drop has
> happened by a firewall, in most remaining cases the backend is in a very

I recently had a situation on one of my systems (JBOSS 4.0.x with Tomcat 5.5,
classic connector) where this wasn't true AFAICT. Both httpd
and JBOSS are on the same box so there was definitely no firewall / network
issue that caused the problem. CPING's failed with a timeout, but new
connections worked fine and I couldn't find any blocked processors threads
in the thread dump and load and GC wasn't very high. That was the starting point
for my patch as I thought that a failed CPING should not do a final verdict
on the request, but should trigger one more try. I fixed this temporarily by
a somewhat tricky LB configuration over the one backend with one retry attempt.
But to be honest, I do not know what caused this strange situation. As the
JBOSS version and thus the Tomcat version is quited aged it is possible
that there might be a bug in the classic connector that is fixed in the
meantime. But I am leaving the subject of the thread here.

> badly broken state. Both situations would result in nearly all remaining
> connections would be broken as well, but not necessarily all (in the
> firewall case, there might be less idle connections coming back from
> other threads). So a good reaction to a CPING failure would be a pool
> wide connection check and using a new connection.

In general I agree, but doing this in the scope of the request is IMHO too
time consuming and expensive.

> 
> If you are afraid, that the check of all connections in the pool takes
> to long (maybe running into TCP timeouts), you could directly try a

That is what the patch does.

> fresh connection, and set an indicator in the pool, that the maintenance
> task, which is usually only looking for idle connections to close,
> should additionally do a function check for all connections. That would
> be non-critical concerning latency for requests, once maintenance runs
> decoupled from a request (like what Mladen suggested, either in a
> separate thread or using the monitor hook).

This seems like a nice idea once we have some kind of maintenance "thread".
But I am not sure how this can be done with the current reslist implementation
because of its stack character. Keep in mind that we cannot extend the API of
the reslist until APR-UTIL 1.4.0 and cannot change it until APR-UTIL 2.0.
So this could proof to be tricky.

>> One question as you are more familiar with the AJP server code on Tomcat
>> side:
>> If a connector closes down a connection due to its idleness does it send
>> any
>> kind of AJP shutdown package via the TCP connection or does it just
>> close the
>> socket like in the HTTP keepalive case?
> 
> Unfortunately the AJP13 protocol is a little weak on connection
> handling. There is no message indicating the backend shuts down its
> connection. So it just closes the socket.

Thanks for pointing. This is all I wanted to know. If there would be unread
data (an AJP connection close packet) the detection whether the remote side
closed the socket wouldn't work.

Regards

Rüdiger

Re: Behaviour of mod_proxy_ajp if CPING/CPONG fails

Posted by Rainer Jung <ra...@kippdata.de>.

Ruediger Pluem schrieb:
> 
> 
> On 09/06/2008 10:54 PM, Rainer Jung wrote:
>> Rüdiger Plüm schrieb:
>>>
>>> On 09/05/2008 06:21 PM, Mladen Turk wrote:
>>>> Plüm, Rüdiger, VF-Group wrote:
>>>>>  
> 
>>>> while (apr_proxy_acguire_connection) {
>>>>    fresh = 0
>>>>    if (conn->sock == NULL) {
>>>>       fresh = 1
>>>>    }
>>>>    ap_proxy_determine_connection
>>>>    ap_proxy_connect_to_backend
>>>>    if (!ajp_handle_cping_cpong) {
>>>>         CPING/CPONG failed. Mark the connection for closure.
>>>>     conn->close++;
>>>>         ap_proxy_release_connection
>>>>         if (fresh) {
>>>>            CPING/CPONG failed on fresh connection. bail out.
>>>>            return 503;
>>>>     }
>>>>    }
>>>>    else {
>>>>       CPING/CPONG OK.
>>>>       break;
>>>>    }
>>>> }
>>>> go on with socket
>>> As I said: Due to the fact that the reslist is a stack it results
>>> effectively in the
>>> same thing as my code does. This is because the acquire_connection call
>>> will get
>>> the same faulty (but then closed connection) that the previous
>>> ap_proxy_release_connection
>>> placed back in the reslist.
>>
>> Maybe I'm missing something here:
>>
>> The stack design is useful, because it allows for idle connections to
>> trigger the idle timeout. The most recently used connection gets reused
>> first.
> 
> Exactly.
> 
>>
>> But in case the user of the connection knows, that it's broken, and
>> closes it, it would make more sense to put in under the stack, since it
>> is no longer connected.
> 
> Why? IMHO this defers only the efforts that need to be done anyway to
> create
> a new TCP connection. And keep in mind that even in the case that I put the
> faulty connection back *under* the stack the next connection on top of the
> stack was even longer idle then the one that was faulty. So it is likely to
> be faulty as well. It might be the case though that this faultiness is
> detected earlier (in the TCP connection check) and thus our next
> CPING/CPONG
> in the loop happens with a fine fresh TCP connection.

Yes, I think that's the question: if one CPING fails, what do you want
to do with the remaining connections in the pool?

- do you assume they are broken as well and close them directly (and
afterwards try to open a new one)
- do you want to test them immediately (and if one of them is still OK
maybe end the testing and use it)
- you don't care and simply try to open a new one.

Concerning the argument that the next connection on the step would be
even longer idle: yes, unless another thread returned one to the pool in
the meantime.

My personal opinion (based on mod_jk's connection handling code): CPING
failure is very rare. In most cases, it indicates connection drop has
happened by a firewall, in most remaining cases the backend is in a very
badly broken state. Both situations would result in nearly all remaining
connections would be broken as well, but not necessarily all (in the
firewall case, there might be less idle connections coming back from
other threads). So a good reaction to a CPING failure would be a pool
wide connection check and using a new connection.

If you are afraid, that the check of all connections in the pool takes
to long (maybe running into TCP timeouts), you could directly try a
fresh connection, and set an indicator in the pool, that the maintenance
task, which is usually only looking for idle connections to close,
should additionally do a function check for all connections. That would
be non-critical concerning latency for requests, once maintenance runs
decoupled from a request (like what Mladen suggested, either in a
separate thread or using the monitor hook).

Just my 0.2 Euro Cents.

> One question as you are more familiar with the AJP server code on Tomcat
> side:
> If a connector closes down a connection due to its idleness does it send
> any
> kind of AJP shutdown package via the TCP connection or does it just
> close the
> socket like in the HTTP keepalive case?

Unfortunately the AJP13 protocol is a little weak on connection
handling. There is no message indicating the backend shuts down its
connection. So it just closes the socket.

There is a message in the other direction, if the AJP client wants to
shut down a connection and inform the backend, but in mod_jk, and I also
assume in mod_proxy_ajp it is not used.

Regards,

Rainer

Re: Behaviour of mod_proxy_ajp if CPING/CPONG fails

Posted by Ruediger Pluem <rp...@apache.org>.

On 09/06/2008 10:54 PM, Rainer Jung wrote:
> Rüdiger Plüm schrieb:
>>
>> On 09/05/2008 06:21 PM, Mladen Turk wrote:
>>> Plüm, Rüdiger, VF-Group wrote:
>>>>  

>>> while (apr_proxy_acguire_connection) {
>>>    fresh = 0
>>>    if (conn->sock == NULL) {
>>>       fresh = 1
>>>    }
>>>    ap_proxy_determine_connection
>>>    ap_proxy_connect_to_backend
>>>    if (!ajp_handle_cping_cpong) {
>>>         CPING/CPONG failed. Mark the connection for closure.
>>>     conn->close++;
>>>         ap_proxy_release_connection
>>>         if (fresh) {
>>>            CPING/CPONG failed on fresh connection. bail out.
>>>            return 503;
>>>     }
>>>    }
>>>    else {
>>>       CPING/CPONG OK.
>>>       break;
>>>    }
>>> }
>>> go on with socket
>> As I said: Due to the fact that the reslist is a stack it results
>> effectively in the
>> same thing as my code does. This is because the acquire_connection call
>> will get
>> the same faulty (but then closed connection) that the previous
>> ap_proxy_release_connection
>> placed back in the reslist.
> 
> Maybe I'm missing something here:
> 
> The stack design is useful, because it allows for idle connections to
> trigger the idle timeout. The most recently used connection gets reused
> first.

Exactly.

> 
> But in case the user of the connection knows, that it's broken, and
> closes it, it would make more sense to put in under the stack, since it
> is no longer connected.

Why? IMHO this defers only the efforts that need to be done anyway to create
a new TCP connection. And keep in mind that even in the case that I put the
faulty connection back *under* the stack the next connection on top of the
stack was even longer idle then the one that was faulty. So it is likely to
be faulty as well. It might be the case though that this faultiness is
detected earlier (in the TCP connection check) and thus our next CPING/CPONG
in the loop happens with a fine fresh TCP connection.
One question as you are more familiar with the AJP server code on Tomcat side:
If a connector closes down a connection due to its idleness does it send any
kind of AJP shutdown package via the TCP connection or does it just close the
socket like in the HTTP keepalive case?

Regards

Rüdiger

Re: Behaviour of mod_proxy_ajp if CPING/CPONG fails

Posted by Rainer Jung <ra...@kippdata.de>.

Rüdiger Plüm schrieb:
> 
> 
> On 09/05/2008 06:21 PM, Mladen Turk wrote:
>> Plüm, Rüdiger, VF-Group wrote:
>>>  
>>>>>
>>>> +1 for the concept.
>>>> However for threaded servers you should call
>>>> ap_proxy_acquire_connection inside retry loop, cause there might
>>>> be available connections inside the pool.
>>>
>>> I don't think that this does what you want. If I simply continue to
>>> acquire connections from the pool without returning the faulty ones
>>> back before, other threads might starve because they cannot get
>>> connections
>>> from the reslist any longer (not even faulty ones, that they would
>>> reopen).
>>> If I return the faulty connection to the reslist, there is some
>>> likelyhood
>>> that I get the same connection back within the next acquire as the
>>> reslist
>>> is organized as a stack. IMHO this approach would only work if the
>>> reslist
>>> was organized as a queue, which it is no longer in order to get the ttl
>>> feature in conjunction with smax working correctly.
>>>
>>
>> If failed each connection should be released anyhow, so it's a
>> loop operation that will either return connection with socket
>> (potentially valid), or without a socket for reconnect, in which
>> case you break from the loop in either case.
>>
>> while (apr_proxy_acguire_connection) {
>>    fresh = 0
>>    if (conn->sock == NULL) {
>>       fresh = 1
>>    }
>>    ap_proxy_determine_connection
>>    ap_proxy_connect_to_backend
>>    if (!ajp_handle_cping_cpong) {
>>         CPING/CPONG failed. Mark the connection for closure.
>>     conn->close++;
>>         ap_proxy_release_connection
>>         if (fresh) {
>>            CPING/CPONG failed on fresh connection. bail out.
>>            return 503;
>>     }
>>    }
>>    else {
>>       CPING/CPONG OK.
>>       break;
>>    }
>> }
>> go on with socket
> 
> As I said: Due to the fact that the reslist is a stack it results
> effectively in the
> same thing as my code does. This is because the acquire_connection call
> will get
> the same faulty (but then closed connection) that the previous
> ap_proxy_release_connection
> placed back in the reslist.

Maybe I'm missing something here:

The stack design is useful, because it allows for idle connections to
trigger the idle timeout. The most recently used connection gets reused
first.

But in case the user of the connection knows, that it's broken, and
closes it, it would make more sense to put in under the stack, since it
is no longer connected.

So it seems you need a dequeue data structure to be able to reflect both
use cases.

Regards,

Rainer

Re: Behaviour of mod_proxy_ajp if CPING/CPONG fails

Posted by Rüdiger Plüm <r....@gmx.de>.


On 09/05/2008 06:21 PM, Mladen Turk wrote:
> Plüm, Rüdiger, VF-Group wrote:
>>  
>>>>
>>> +1 for the concept.
>>> However for threaded servers you should call
>>> ap_proxy_acquire_connection inside retry loop, cause there might
>>> be available connections inside the pool.
>>
>> I don't think that this does what you want. If I simply continue to
>> acquire connections from the pool without returning the faulty ones
>> back before, other threads might starve because they cannot get 
>> connections
>> from the reslist any longer (not even faulty ones, that they would 
>> reopen).
>> If I return the faulty connection to the reslist, there is some 
>> likelyhood
>> that I get the same connection back within the next acquire as the 
>> reslist
>> is organized as a stack. IMHO this approach would only work if the 
>> reslist
>> was organized as a queue, which it is no longer in order to get the ttl
>> feature in conjunction with smax working correctly.
>>
> 
> If failed each connection should be released anyhow, so it's a
> loop operation that will either return connection with socket
> (potentially valid), or without a socket for reconnect, in which
> case you break from the loop in either case.
> 
> while (apr_proxy_acguire_connection) {
>    fresh = 0
>    if (conn->sock == NULL) {
>       fresh = 1
>    }
>    ap_proxy_determine_connection
>    ap_proxy_connect_to_backend
>    if (!ajp_handle_cping_cpong) {
>         CPING/CPONG failed. Mark the connection for closure.
>     conn->close++;
>         ap_proxy_release_connection
>         if (fresh) {
>            CPING/CPONG failed on fresh connection. bail out.
>            return 503;
>     }
>    }
>    else {
>       CPING/CPONG OK.
>       break;
>    }
> }
> go on with socket

As I said: Due to the fact that the reslist is a stack it results effectively in the
same thing as my code does. This is because the acquire_connection call will get
the same faulty (but then closed connection) that the previous ap_proxy_release_connection
placed back in the reslist.

Regards

Rüdiger

Re: Behaviour of mod_proxy_ajp if CPING/CPONG fails

Posted by Mladen Turk <mt...@apache.org>.

Plüm, Rüdiger, VF-Group wrote:
>  
>>>
>> +1 for the concept.
>> However for threaded servers you should call
>> ap_proxy_acquire_connection inside retry loop, cause there might
>> be available connections inside the pool.
> 
> I don't think that this does what you want. If I simply continue to
> acquire connections from the pool without returning the faulty ones
> back before, other threads might starve because they cannot get connections
> from the reslist any longer (not even faulty ones, that they would reopen).
> If I return the faulty connection to the reslist, there is some likelyhood
> that I get the same connection back within the next acquire as the reslist
> is organized as a stack. IMHO this approach would only work if the reslist
> was organized as a queue, which it is no longer in order to get the ttl
> feature in conjunction with smax working correctly.
>

If failed each connection should be released anyhow, so it's a
loop operation that will either return connection with socket
(potentially valid), or without a socket for reconnect, in which
case you break from the loop in either case.

while (apr_proxy_acguire_connection) {
    fresh = 0
    if (conn->sock == NULL) {
       fresh = 1
    }
    ap_proxy_determine_connection
    ap_proxy_connect_to_backend
    if (!ajp_handle_cping_cpong) {
         CPING/CPONG failed. Mark the connection for closure.
	conn->close++;
         ap_proxy_release_connection
         if (fresh) {
            CPING/CPONG failed on fresh connection. bail out.
            return 503;
	}
    }
    else {
       CPING/CPONG OK.
       break;
    }
}
go on with socket

Regards.
-- 
^(TM)

Re: Behaviour of mod_proxy_ajp if CPING/CPONG fails

Posted by "Plüm, Rüdiger, VF-Group" <ru...@vodafone.com>.

> -----Ursprüngliche Nachricht-----
> Von: Mladen Turk 
> Gesendet: Freitag, 5. September 2008 17:07
> An: dev@httpd.apache.org
> Betreff: Re: Behaviour of mod_proxy_ajp if CPING/CPONG fails
> 
> Plüm, Rüdiger, VF-Group wrote:
> 
> > So after the first failed CPING/CPONG we should try again 
> with a new TCP connection
> > and should only return HTTP_SERVICE_UNAVAILABLE if this 
> fails as well. The following
> > (and attached patch) does exactly that. Thoughs to the 
> above and/or to the patch
> > before I commit?
> >
> 
> +1 for the concept.
> However for threaded servers you should call
> ap_proxy_acquire_connection inside retry loop, cause there might
> be available connections inside the pool.

I don't think that this does what you want. If I simply continue to
acquire connections from the pool without returning the faulty ones
back before, other threads might starve because they cannot get connections
from the reslist any longer (not even faulty ones, that they would reopen).
If I return the faulty connection to the reslist, there is some likelyhood
that I get the same connection back within the next acquire as the reslist
is organized as a stack. IMHO this approach would only work if the reslist
was organized as a queue, which it is no longer in order to get the ttl
feature in conjunction with smax working correctly.

Regards

Rüdiger

Re: Behaviour of mod_proxy_ajp if CPING/CPONG fails

Posted by Mladen Turk <mt...@apache.org>.

Plüm, Rüdiger, VF-Group wrote:

> So after the first failed CPING/CPONG we should try again with a new TCP connection
> and should only return HTTP_SERVICE_UNAVAILABLE if this fails as well. The following
> (and attached patch) does exactly that. Thoughs to the above and/or to the patch
> before I commit?
>

+1 for the concept.
However for threaded servers you should call
ap_proxy_acquire_connection inside retry loop, cause there might
be available connections inside the pool.
If all are down then do a single reconnect, and if that fails
then give up (that's how it's done in mod_jk).


Regards
-- 
^(TM)

Re: Behaviour of mod_proxy_ajp if CPING/CPONG fails

Posted by Jim Jagielski <ji...@jaguNET.com>.

On Sep 5, 2008, at 10:21 AM, Plüm, Rüdiger, VF-Group wrote:

> IMHO the current behaviour of mod_proxy_ajp is not correct if a  
> CPING/CPONG fails
> on a backend connection. In this case the status is set to  
> HTTP_SERVICE_UNAVAILABLE
> and the scheme handler returns. In the case of an unbalanced backend  
> this results
> in a HTTP_SERVICE_UNAVAILABLE (503) to be returned to the client.  
> But a failing
> CPING/CPONG can be caused by a faulty AJP connection that was closed  
> by the backend
> or a race condition in sending the CPING and closing the connection  
> on the backend.
> So after the first failed CPING/CPONG we should try again with a new  
> TCP connection
> and should only return HTTP_SERVICE_UNAVAILABLE if this fails as  
> well. The following
> (and attached patch) does exactly that. Thoughs to the above and/or  
> to the patch
> before I commit?
>

I like error detection, but we are always fooling ourselves if we
think that we eliminate all race conditions by retrying... I ran
into this back when I was working on a HTTP/OPTIONS "cping/cpong"
for httpd. Although HTTP has other issues (like what to do with
keepalives and proxy settings for them, etc), the end result was
that even if the OPTIONS/cping succeeds, there is always the possibility
that the actual request will fail. So we should be robust on that
situation and respond in a way that is the least astonishing to the
client.

So I'm fine with retries, but it doesn't eliminate the possibility...