You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Luca Toscano <to...@gmail.com> on 2018/02/01 03:38:36 UTC

Re: [users@httpd] problems benchmarking php-fpm/proxy_fcgi with h2load

Hi Yann,

2018-01-21 4:42 GMT-08:00 Yann Ylavic <yl...@gmail.com>:

>
> Reusing connections means that we can have ThreadsPerChild sockets per
> backend (per child process).
> Is it really a leak (the more requests, the more sockets, never
> closed), or does it stop at ThreadsPerChild?
>

This was a red herring due to my ignorance about lsof and how it reports
fds opened by process and its threads, I don't believe that there is any
leak of fds.


> My patch simply disables connections recycling/reuse for Unix sockets,
> but it's probably not the right since that's supposed to work.
> Could it be that php-fpm simply does not like connections
> keepalive/reuse? There may be a reason why we disablereuse by default
> in mod_proxy_fcgi...
> Where do communications stop working at 60%, something in the error
> log? Possibly some tcpdump/wireshark between proxy_fcgi and the
> backend could help here too.
>


I did some testing with different httpd/php-fpm configurations and I
noticed something interesting. I removed mod_http2 and used a static/fixed
number of php-fpm workers to simplify the picture, and I was able to
reproduce the issue way before reaching the 60% limit using http/1.1. In
the logs I can clearly see that one httpd process/thread stops logging
after sending the environment variables, and gdb shows this:

Thread 2 (Thread 0x7f99ecf19700 (LWP 21755)):
#0  0x00007f99f60b76ad in poll () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007f99f6e024f4 in apr_poll (aprset=aprset@entry=0x7f99ecf16a00,
    num=num@entry=1, nsds=nsds@entry=0x7f99ecf169b4, timeout=<optimized
out>)
    at poll/unix/poll.c:120
#2  0x00007f99f0b45893 in dispatch (conn=conn@entry=0x56217d285a60,
    conf=conf@entry=0x56217d1c6800, r=r@entry=0x7f99d8007820,
    setaside_pool=0x7f99d800d898, err=err@entry=0x7f99ecf18b40,
    bad_request=bad_request@entry=0x7f99ecf18ae4,
has_responded=0x7f99ecf18ae8,
    request_id=1) at mod_proxy_fcgi.c:563
#3  0x00007f99f0b46e32 in fcgi_do_request (p=<optimized out>, origin=0x0,
    uri=<optimized out>, url=<optimized out>, server_portstr=0x7f99ecf18b20
":8443",
    conf=0x56217d1c6800, conn=0x56217d285a60, r=0x7f99d8007820) at
mod_proxy_fcgi.c:936
#4  proxy_fcgi_handler (r=0x7f99d8007820, worker=<optimized out>,
conf=<optimized out>,
    url=<optimized out>, proxyname=<optimized out>, proxyport=<optimized
out>)
    at mod_proxy_fcgi.c:1050
[..]

And all the php-fpm workers this (sadly without debug symbols):

Thread 1 (Thread 0x7fe67a0bcd40 (LWP 21882)):
#0  0x00007fe676b2a1f0 in __read_nocancel () at
../sysdeps/unix/syscall-template.S:84
#1  0x00000000007a9734 in ?? ()
#2  0x00000000007aad60 in fcgi_accept_request ()
#3  0x0000000000464424 in main ()

strace shows the php-fpm process blocked in read(4,

Interesting stats running h2load:

*   -n100000 -c10 -m10 https://etc..

With http/2 I can see the h2load reaching 100%, meanwhile with http/1.1 it
doesn't even reach 10% before blocking like described above.

*  -n100000 -c100 -m10 https://etc..

Problems showing up with http/2 too. In both case setting Timeout (or
ProxyTimeout) to a low value makes connections to get closed due to
inactivity and h2load slowly proceeds.

pstree of my testing environment:

|-httpd-+-httpd
|       `-10*[httpd---21*[{httpd}]]

|-php5-fpm---10*[php5-fpm]


The impression that I have is that somehow mod_proxy_fcgi sends a FCGI
request over a (reused) unix socket's fd and waits polling the fcgi's
backend response, that in the meantime is doing something else. I am going
to do more tests in the following days, but if anybody has a clear idea
about what's happening please let me know :)

Thanks!

Luca

Re: [users@httpd] problems benchmarking php-fpm/proxy_fcgi with h2load

Posted by Luca Toscano <to...@gmail.com>.

2018-02-04 2:27 GMT+01:00 Yann Ylavic <yl...@gmail.com>:
>
>
> I'm not sure playing with 'max' is the the right thing to do in this
> case, it makes few sense to configure max_children on fpm lower than
> httpd's MaxRequestWorkers, they probably should be the same in the
> first place.
> Limits should indeed be imposed by the proxy (according to upstream
> capacity), but 'max' is kind of waste of resources here on the proxy
> side, the right/same MaxRequestWorkers on httpd should be enough (with
> possibly reasonable/agressive MaxSpareThreads/Servers) provided it
> does not exceed fpm's max_children.


Definitely, I agree. What I meant to say with the 'max' comment was that it
is a quick way to workaround the stall :)

Going to update users@ and the docs!

Luca

Re: [users@httpd] problems benchmarking php-fpm/proxy_fcgi with h2load

Posted by Yann Ylavic <yl...@gmail.com>.

Hi Luca,

On Sun, Feb 4, 2018 at 12:37 AM, Luca Toscano <to...@gmail.com> wrote:
>
> After some tests and code reviewed I have a theory about what's happening,
> let me know if it makes sense. PHP-FPM uses a "prefork-like" model, so one
> worker process per connection, that is really important when thinking about
> using enablereuse=on in mod_proxy_fcgi. As Yann mentioned, by default a
> total of ThreadsPerChild connections for each httpd process can be opened
> towards the FCGI backend. This was the case in my tests reported above: the
> overall (persistent) connections created/attempted was way more than the
> php-fpm processes. This can in my opinion explain very well the stall
> mentioned by Hajo in his email, and possibly the one that I was observing
> with gdb. Once all the PHP-FPM processes are busy with httpd persistent
> connections then all the other attempts will be queued and served only
> if/when a PHP-FPM process is freed (for example Timeut/ProxyTimeout
> elapsed).

This looks a very plausible scenario IMO, and clearly what happens if
"MaxClients"/max_children is reached on fpm side.

>
> I re-tested everything using the mod_proxy's 'max' parameter to limit the
> number of connections in the proxy pool (of each httpd process) and
> everything worked like a charm.

I'm not sure playing with 'max' is the the right thing to do in this
case, it makes few sense to configure max_children on fpm lower than
httpd's MaxRequestWorkers, they probably should be the same in the
first place.
Limits should indeed be imposed by the proxy (according to upstream
capacity), but 'max' is kind of waste of resources here on the proxy
side, the right/same MaxRequestWorkers on httpd should be enough (with
possibly reasonable/agressive MaxSpareThreads/Servers) provided it
does not exceed fpm's max_children.

>  So as far as I can tell mod_proxy_fcgi's
> connection reuse works for both TCP and Unix sockets as intended, but the
> docs should be more explicit about gotchas and precautions to take in the
> configuration. If this makes sense I'll take care of the docs updates :)

Yes, it could be useful to document the need to tune the proxy
according to the capacity of the backend (oversizing the proxy is
useful only if its work is higher than the backend's), inconsistencies
there can leed to stalls.

Regards,
Yann.

Re: [users@httpd] problems benchmarking php-fpm/proxy_fcgi with h2load

Posted by Luca Toscano <to...@gmail.com>.

2018-02-01 4:38 GMT+01:00 Luca Toscano <to...@gmail.com>:

> Hi Yann,
>
> 2018-01-21 4:42 GMT-08:00 Yann Ylavic <yl...@gmail.com>:
>
>>
>> Reusing connections means that we can have ThreadsPerChild sockets per
>> backend (per child process).
>> Is it really a leak (the more requests, the more sockets, never
>> closed), or does it stop at ThreadsPerChild?
>>
>
> This was a red herring due to my ignorance about lsof and how it reports
> fds opened by process and its threads, I don't believe that there is any
> leak of fds.
>
>
>> My patch simply disables connections recycling/reuse for Unix sockets,
>> but it's probably not the right since that's supposed to work.
>> Could it be that php-fpm simply does not like connections
>> keepalive/reuse? There may be a reason why we disablereuse by default
>> in mod_proxy_fcgi...
>> Where do communications stop working at 60%, something in the error
>> log? Possibly some tcpdump/wireshark between proxy_fcgi and the
>> backend could help here too.
>>
>
>
> I did some testing with different httpd/php-fpm configurations and I
> noticed something interesting. I removed mod_http2 and used a static/fixed
> number of php-fpm workers to simplify the picture, and I was able to
> reproduce the issue way before reaching the 60% limit using http/1.1. In
> the logs I can clearly see that one httpd process/thread stops logging
> after sending the environment variables, and gdb shows this:
>
> Thread 2 (Thread 0x7f99ecf19700 (LWP 21755)):
> #0  0x00007f99f60b76ad in poll () at ../sysdeps/unix/syscall-template.S:84
> #1  0x00007f99f6e024f4 in apr_poll (aprset=aprset@entry=0x7f99ecf16a00,
>     num=num@entry=1, nsds=nsds@entry=0x7f99ecf169b4, timeout=<optimized
> out>)
>     at poll/unix/poll.c:120
> #2  0x00007f99f0b45893 in dispatch (conn=conn@entry=0x56217d285a60,
>     conf=conf@entry=0x56217d1c6800, r=r@entry=0x7f99d8007820,
>     setaside_pool=0x7f99d800d898, err=err@entry=0x7f99ecf18b40,
>     bad_request=bad_request@entry=0x7f99ecf18ae4,
> has_responded=0x7f99ecf18ae8,
>     request_id=1) at mod_proxy_fcgi.c:563
> #3  0x00007f99f0b46e32 in fcgi_do_request (p=<optimized out>, origin=0x0,
>     uri=<optimized out>, url=<optimized out>,
> server_portstr=0x7f99ecf18b20 ":8443",
>     conf=0x56217d1c6800, conn=0x56217d285a60, r=0x7f99d8007820) at
> mod_proxy_fcgi.c:936
> #4  proxy_fcgi_handler (r=0x7f99d8007820, worker=<optimized out>,
> conf=<optimized out>,
>     url=<optimized out>, proxyname=<optimized out>, proxyport=<optimized
> out>)
>     at mod_proxy_fcgi.c:1050
> [..]
>
> And all the php-fpm workers this (sadly without debug symbols):
>
> Thread 1 (Thread 0x7fe67a0bcd40 (LWP 21882)):
> #0  0x00007fe676b2a1f0 in __read_nocancel () at ../sysdeps/unix/syscall-
> template.S:84
> #1  0x00000000007a9734 in ?? ()
> #2  0x00000000007aad60 in fcgi_accept_request ()
> #3  0x0000000000464424 in main ()
>
> strace shows the php-fpm process blocked in read(4,
>
> Interesting stats running h2load:
>
> *   -n100000 -c10 -m10 https://etc..
>
> With http/2 I can see the h2load reaching 100%, meanwhile with http/1.1 it
> doesn't even reach 10% before blocking like described above.
>
> *  -n100000 -c100 -m10 https://etc..
>
> Problems showing up with http/2 too. In both case setting Timeout (or
> ProxyTimeout) to a low value makes connections to get closed due to
> inactivity and h2load slowly proceeds.
>
> pstree of my testing environment:
>
> |-httpd-+-httpd
> |       `-10*[httpd---21*[{httpd}]]
>
> |-php5-fpm---10*[php5-fpm]
>
>
> The impression that I have is that somehow mod_proxy_fcgi sends a FCGI
> request over a (reused) unix socket's fd and waits polling the fcgi's
> backend response, that in the meantime is doing something else. I am going
> to do more tests in the following days, but if anybody has a clear idea
> about what's happening please let me know :)
>


After some tests and code reviewed I have a theory about what's happening,
let me know if it makes sense. PHP-FPM uses a "prefork-like" model, so one
worker process per connection, that is really important when thinking about
using enablereuse=on in mod_proxy_fcgi. As Yann mentioned, by default a
total of ThreadsPerChild connections for each httpd process can be opened
towards the FCGI backend. This was the case in my tests reported above: the
overall (persistent) connections created/attempted was way more than the
php-fpm processes. This can in my opinion explain very well the stall
mentioned by Hajo in his email, and possibly the one that I was observing
with gdb. Once all the PHP-FPM processes are busy with httpd persistent
connections then all the other attempts will be queued and served only
if/when a PHP-FPM process is freed (for example Timeut/ProxyTimeout
elapsed).

I re-tested everything using the mod_proxy's 'max' parameter to limit the
number of connections in the proxy pool (of each httpd process) and
everything worked like a charm. So as far as I can tell mod_proxy_fcgi's
connection reuse works for both TCP and Unix sockets as intended, but the
docs should be more explicit about gotchas and precautions to take in the
configuration. If this makes sense I'll take care of the docs updates :)

Thanks!

Luca