You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Rainer Jung <ra...@kippdata.de> on 2017/10/01 22:08:55 UTC

Solaris prefork proxy failures (Was: [VOTE] Release Apache httpd 2.4.28 as GA)

I observe sporadic failures in the proxy tests for 2.4.x.

They only happen on Solaris (Sparc) and only for prefork MPM. Although
they are not strictly reproducible, I think they only happen with APR
1.6 an with mod_http2/mod_proxy_http2 loaded (although they do not
happen during their test cases).

The failure pattern is, that one child process gets the request it
should (reverse) proxy. The process connects to the server to forward
the request and then hangs until the proxy timeout triggers. Then it
returns a proxy timeout but in addition once the timeout has been sent
to the client, the same process starts to work on the backend request.

Example excerpt from the error log:

[Sun Oct 01 14:22:56.980082 2017] [core:trace5] [pid 1310]
protocol.c(645): [client 127.0.0.1:44963] Request received from client:
GET /reverse/modules/cgi/nph-102.pl HTTP/1.1
...
[Sun Oct 01 14:22:56.981065 2017] [proxy:trace2] [pid 1310]
proxy_util.c(1962): [client 127.0.0.1:44963] http: found worker
http://localhost:8529/ for http://localhost:8529/modules/cgi/nph-102.pl
...
[Sun Oct 01 14:22:56.981552 2017] [proxy:debug] [pid 1310]
proxy_util.c(2887): AH02824: HTTP: connection established with
127.0.0.1:8529 (localhost)
[Sun Oct 01 14:22:56.982969 2017] [proxy:debug] [pid 1310]
proxy_util.c(3054): AH00962: HTTP: connection complete to 127.0.0.1:8529
(localhost)
[Sun Oct 01 14:22:56.983142 2017] [core:trace6] [pid 1310]
core_filters.c(525): [remote 127.0.0.1:8529] core_output_filter:
flushing because of FLUSH bucket

- here 60 seconds nothing happening or in the log -

[Sun Oct 01 14:23:56.983933 2017] [proxy_http:error] [pid 1310]
(70007)The timeout specified has expired: [client 127.0.0.1:44963]
AH01102: error reading status line from remote server localhost:8529
[Sun Oct 01 14:23:56.984150 2017] [proxy_http:debug] [pid 1310]
mod_proxy_http.c(1265): [client 127.0.0.1:44963] AH01103: read timeout
[Sun Oct 01 14:23:56.984178 2017] [proxy_http:debug] [pid 1310]
mod_proxy_http.c(1324): [client 127.0.0.1:44963] AH01105: NOT Closing
connection to client although reading from backend server localhost:8529
failed.
...
[Sun Oct 01 14:23:56.995756 2017] [http:trace3] [pid 1310]
http_filters.c(1128): [client 127.0.0.1:44963] Response sent with status
502, headers:
...
[Sun Oct 01 14:23:56.996158 2017] [core:trace6] [pid 1310]
core_filters.c(525): [client 127.0.0.1:44963] core_output_filter:
flushing because of FLUSH bucket

- and then suddenly (but too late) work on the original backend request
starts -

[Sun Oct 01 14:23:56.999866 2017] [core:trace5] [pid 1310]
protocol.c(645): [client 127.0.0.1:44966] Request received from client:
GET /modules/cgi/nph-102.pl HTTP/1.1
...
[Sun Oct 01 14:23:57.103062 2017] [core:trace6] [pid 1310]
core_filters.c(525): [client 127.0.0.1:44966] core_output_filter:
flushing because of FLUSH bucket

Note that during the time that the one child waits on the backend
response, the two other prefork child processes were hanging in
accept_mutex_on(), waiting for the lock. One also has to run bigger
parts of the test suite. Just running the proxy part does not suffice.

So it seems for some reason the proxy module can't effectively forward
the request to the idle children, instead it likely connects to the
kernel doing the accept and once the first child times out and tries to
acquire the accept mutex, it gets it and processes the old forwarded
request. The accept mutex used was pthread. It seems maybe the single
remaining child process that actually does work might not have unlocked
the accept mutex due to some race or error during an earlier test case.

Possible ingredients could be (speculation):

- APR 1.6 mutex changes and possible defaults changes
- HTTP2 additional mutex use somehow interfering on Solaris
- Accept handling changes

Since I can only observe it with prefork in combination with http2 which
is not a good combination in any way, I am not too concerned. Still it
could be useful to understand whats going on.

I will try to look further, but I am not too optimistic to get to the
bottom of this.

Regards,

Rainer

Re: [VOTE] Release Apache httpd 2.4.28 as GA

Posted by "Helmut K. C. Tessarek" <te...@evermeet.cx>.
On 2017-10-01 18:08, Rainer Jung wrote:
> Since I can only observe it with prefork in combination with http2 which
> is not a good combination in any way, I am not too concerned. Still it
> could be useful to understand whats going on.

How can this even be? http2 is automatically deactivated when mpm is
prefork. Stefan added this in 2.4.27.
There is no h2 with prefork anymore.

Cheers,
  K. C.

-- 
regards Helmut K. C. Tessarek              KeyID 0xF7832007C11F128D
Key fingerprint = 28A3 1666 4FE8 D72C CFD5 8B23 F783 2007 C11F 128D

/*
   Thou shalt not follow the NULL pointer for chaos and madness
   await thee at its end.
*/