You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Yann Ylavic <yl...@gmail.com> on 2018/01/21 12:42:29 UTC

Re: [users@httpd] problems benchmarking php-fpm/proxy_fcgi with h2load

[Moved from users@]

On Sat, Jan 20, 2018 at 9:57 PM, Luca Toscano <to...@gmail.com> wrote:
>
> 2018-01-20 20:23 GMT+01:00 Luca Toscano <to...@gmail.com>:
>>
>> 2018-01-19 17:40 GMT+01:00 Yann Ylavic <yl...@gmail.com>:
>>>
>>> On Fri, Jan 19, 2018 at 5:14 PM, Yann Ylavic <yl...@gmail.com>
>>> wrote:
>>> > On Fri, Jan 19, 2018 at 1:46 PM, Daniel <df...@gmail.com> wrote:
>>> >> I vaguely recall some issue with reuse when using unix socket files so
>>> >> it was deliberately set to off by default, but yes, perhaps someone
>>> >> experienced enough with mod_proxy_fcgi inner workings can shed some
>>> >> light on this and the why yes/not.
>>> >>
>>> >> With socket files I never tried to enable "enablereuse=on" and got
>>> >> much successful results, so perhaps it's safer to keep it off until
>>> >> someone clarifies this issue, after all when dealing with unix sockets
>>> >> the access delays are quite low.
>>> >
>>> > {en,dis}ablereuse has no effect on Unix Domain Sockets in mod_proxy,
>>> > they are never reused.
>>>
>>> Well, actually it shouldn't, but while the code clearly doesn't reuse
>>> sockets (creates a new one for each request), nothing seems to tell
>>> the recycler that it should close them unconditionally at the end of
>>> the request.
>>
>>
>> Would you mind to point me to the snippet of code that does this? I am
>> trying to reproduce the issue and see if there is a fd leak but didn't
>> manage to so far..

Finally, after re-reading the code, I'm not sure the issue is in httpd.
In ap_proxy_connect_backend(), we expect ap_proxy_check_connection()
to check whether the connection can be reused or not, and it should
work for Unix sockets too.
I missed that call in my first analysis, so thought a new socket was
always created...

>
> I am now able to reproduce with Hajo's settings, and indeed with
> enablereuse=on I can see a lot of fds leaked via lsof:
>
> httpd     3230 3481    www-data   93u     unix 0xffff9ada0cf60400      0t0
> 406770 type=STREAM
> httpd     3230 3481    www-data   94u     unix 0xffff9ada0cf60800      0t0
> 406773 type=STREAM
> httpd     3230 3481    www-data   95u     unix 0xffff9ada0cf66400      0t0
> 406776 type=STREAM
> [..]

Reusing connections means that we can have ThreadsPerChild sockets per
backend (per child process).
Is it really a leak (the more requests, the more sockets, never
closed), or does it stop at ThreadsPerChild?

>
> With Yann's patch I cannot seem them anymore, anche h2load does not stop at
> 50%/60% but completes without any issue. I am still not able to understand
> why this happens reading the proxy_util.c code though :)

My patch simply disables connections recycling/reuse for Unix sockets,
but it's probably not the right since that's supposed to work.
Could it be that php-fpm simply does not like connections
keepalive/reuse? There may be a reason why we disablereuse by default
in mod_proxy_fcgi...
Where do communications stop working at 60%, something in the error
log? Possibly some tcpdump/wireshark between proxy_fcgi and the
backend could help here too.


Regards,
Yann.

Re: [users@httpd] problems benchmarking php-fpm/proxy_fcgi with h2load

Posted by Luca Toscano <to...@gmail.com>.
2018-02-04 2:27 GMT+01:00 Yann Ylavic <yl...@gmail.com>:
>
>
> I'm not sure playing with 'max' is the the right thing to do in this
> case, it makes few sense to configure max_children on fpm lower than
> httpd's MaxRequestWorkers, they probably should be the same in the
> first place.
> Limits should indeed be imposed by the proxy (according to upstream
> capacity), but 'max' is kind of waste of resources here on the proxy
> side, the right/same MaxRequestWorkers on httpd should be enough (with
> possibly reasonable/agressive MaxSpareThreads/Servers) provided it
> does not exceed fpm's max_children.


Definitely, I agree. What I meant to say with the 'max' comment was that it
is a quick way to workaround the stall :)

Going to update users@ and the docs!

Luca

Re: [users@httpd] problems benchmarking php-fpm/proxy_fcgi with h2load

Posted by Yann Ylavic <yl...@gmail.com>.
Hi Luca,

On Sun, Feb 4, 2018 at 12:37 AM, Luca Toscano <to...@gmail.com> wrote:
>
> After some tests and code reviewed I have a theory about what's happening,
> let me know if it makes sense. PHP-FPM uses a "prefork-like" model, so one
> worker process per connection, that is really important when thinking about
> using enablereuse=on in mod_proxy_fcgi. As Yann mentioned, by default a
> total of ThreadsPerChild connections for each httpd process can be opened
> towards the FCGI backend. This was the case in my tests reported above: the
> overall (persistent) connections created/attempted was way more than the
> php-fpm processes. This can in my opinion explain very well the stall
> mentioned by Hajo in his email, and possibly the one that I was observing
> with gdb. Once all the PHP-FPM processes are busy with httpd persistent
> connections then all the other attempts will be queued and served only
> if/when a PHP-FPM process is freed (for example Timeut/ProxyTimeout
> elapsed).

This looks a very plausible scenario IMO, and clearly what happens if
"MaxClients"/max_children is reached on fpm side.

>
> I re-tested everything using the mod_proxy's 'max' parameter to limit the
> number of connections in the proxy pool (of each httpd process) and
> everything worked like a charm.

I'm not sure playing with 'max' is the the right thing to do in this
case, it makes few sense to configure max_children on fpm lower than
httpd's MaxRequestWorkers, they probably should be the same in the
first place.
Limits should indeed be imposed by the proxy (according to upstream
capacity), but 'max' is kind of waste of resources here on the proxy
side, the right/same MaxRequestWorkers on httpd should be enough (with
possibly reasonable/agressive MaxSpareThreads/Servers) provided it
does not exceed fpm's max_children.

>  So as far as I can tell mod_proxy_fcgi's
> connection reuse works for both TCP and Unix sockets as intended, but the
> docs should be more explicit about gotchas and precautions to take in the
> configuration. If this makes sense I'll take care of the docs updates :)

Yes, it could be useful to document the need to tune the proxy
according to the capacity of the backend (oversizing the proxy is
useful only if its work is higher than the backend's), inconsistencies
there can leed to stalls.


Regards,
Yann.

Re: [users@httpd] problems benchmarking php-fpm/proxy_fcgi with h2load

Posted by Luca Toscano <to...@gmail.com>.
2018-02-01 4:38 GMT+01:00 Luca Toscano <to...@gmail.com>:

> Hi Yann,
>
> 2018-01-21 4:42 GMT-08:00 Yann Ylavic <yl...@gmail.com>:
>
>>
>> Reusing connections means that we can have ThreadsPerChild sockets per
>> backend (per child process).
>> Is it really a leak (the more requests, the more sockets, never
>> closed), or does it stop at ThreadsPerChild?
>>
>
> This was a red herring due to my ignorance about lsof and how it reports
> fds opened by process and its threads, I don't believe that there is any
> leak of fds.
>
>
>> My patch simply disables connections recycling/reuse for Unix sockets,
>> but it's probably not the right since that's supposed to work.
>> Could it be that php-fpm simply does not like connections
>> keepalive/reuse? There may be a reason why we disablereuse by default
>> in mod_proxy_fcgi...
>> Where do communications stop working at 60%, something in the error
>> log? Possibly some tcpdump/wireshark between proxy_fcgi and the
>> backend could help here too.
>>
>
>
> I did some testing with different httpd/php-fpm configurations and I
> noticed something interesting. I removed mod_http2 and used a static/fixed
> number of php-fpm workers to simplify the picture, and I was able to
> reproduce the issue way before reaching the 60% limit using http/1.1. In
> the logs I can clearly see that one httpd process/thread stops logging
> after sending the environment variables, and gdb shows this:
>
> Thread 2 (Thread 0x7f99ecf19700 (LWP 21755)):
> #0  0x00007f99f60b76ad in poll () at ../sysdeps/unix/syscall-template.S:84
> #1  0x00007f99f6e024f4 in apr_poll (aprset=aprset@entry=0x7f99ecf16a00,
>     num=num@entry=1, nsds=nsds@entry=0x7f99ecf169b4, timeout=<optimized
> out>)
>     at poll/unix/poll.c:120
> #2  0x00007f99f0b45893 in dispatch (conn=conn@entry=0x56217d285a60,
>     conf=conf@entry=0x56217d1c6800, r=r@entry=0x7f99d8007820,
>     setaside_pool=0x7f99d800d898, err=err@entry=0x7f99ecf18b40,
>     bad_request=bad_request@entry=0x7f99ecf18ae4,
> has_responded=0x7f99ecf18ae8,
>     request_id=1) at mod_proxy_fcgi.c:563
> #3  0x00007f99f0b46e32 in fcgi_do_request (p=<optimized out>, origin=0x0,
>     uri=<optimized out>, url=<optimized out>,
> server_portstr=0x7f99ecf18b20 ":8443",
>     conf=0x56217d1c6800, conn=0x56217d285a60, r=0x7f99d8007820) at
> mod_proxy_fcgi.c:936
> #4  proxy_fcgi_handler (r=0x7f99d8007820, worker=<optimized out>,
> conf=<optimized out>,
>     url=<optimized out>, proxyname=<optimized out>, proxyport=<optimized
> out>)
>     at mod_proxy_fcgi.c:1050
> [..]
>
> And all the php-fpm workers this (sadly without debug symbols):
>
> Thread 1 (Thread 0x7fe67a0bcd40 (LWP 21882)):
> #0  0x00007fe676b2a1f0 in __read_nocancel () at ../sysdeps/unix/syscall-
> template.S:84
> #1  0x00000000007a9734 in ?? ()
> #2  0x00000000007aad60 in fcgi_accept_request ()
> #3  0x0000000000464424 in main ()
>
> strace shows the php-fpm process blocked in read(4,
>
> Interesting stats running h2load:
>
> *   -n100000 -c10 -m10 https://etc..
>
> With http/2 I can see the h2load reaching 100%, meanwhile with http/1.1 it
> doesn't even reach 10% before blocking like described above.
>
> *  -n100000 -c100 -m10 https://etc..
>
> Problems showing up with http/2 too. In both case setting Timeout (or
> ProxyTimeout) to a low value makes connections to get closed due to
> inactivity and h2load slowly proceeds.
>
> pstree of my testing environment:
>
> |-httpd-+-httpd
> |       `-10*[httpd---21*[{httpd}]]
>
> |-php5-fpm---10*[php5-fpm]
>
>
> The impression that I have is that somehow mod_proxy_fcgi sends a FCGI
> request over a (reused) unix socket's fd and waits polling the fcgi's
> backend response, that in the meantime is doing something else. I am going
> to do more tests in the following days, but if anybody has a clear idea
> about what's happening please let me know :)
>


After some tests and code reviewed I have a theory about what's happening,
let me know if it makes sense. PHP-FPM uses a "prefork-like" model, so one
worker process per connection, that is really important when thinking about
using enablereuse=on in mod_proxy_fcgi. As Yann mentioned, by default a
total of ThreadsPerChild connections for each httpd process can be opened
towards the FCGI backend. This was the case in my tests reported above: the
overall (persistent) connections created/attempted was way more than the
php-fpm processes. This can in my opinion explain very well the stall
mentioned by Hajo in his email, and possibly the one that I was observing
with gdb. Once all the PHP-FPM processes are busy with httpd persistent
connections then all the other attempts will be queued and served only
if/when a PHP-FPM process is freed (for example Timeut/ProxyTimeout
elapsed).

I re-tested everything using the mod_proxy's 'max' parameter to limit the
number of connections in the proxy pool (of each httpd process) and
everything worked like a charm. So as far as I can tell mod_proxy_fcgi's
connection reuse works for both TCP and Unix sockets as intended, but the
docs should be more explicit about gotchas and precautions to take in the
configuration. If this makes sense I'll take care of the docs updates :)

Thanks!

Luca

Re: [users@httpd] problems benchmarking php-fpm/proxy_fcgi with h2load

Posted by Luca Toscano <to...@gmail.com>.
Hi Yann,

2018-01-21 4:42 GMT-08:00 Yann Ylavic <yl...@gmail.com>:

>
> Reusing connections means that we can have ThreadsPerChild sockets per
> backend (per child process).
> Is it really a leak (the more requests, the more sockets, never
> closed), or does it stop at ThreadsPerChild?
>

This was a red herring due to my ignorance about lsof and how it reports
fds opened by process and its threads, I don't believe that there is any
leak of fds.


> My patch simply disables connections recycling/reuse for Unix sockets,
> but it's probably not the right since that's supposed to work.
> Could it be that php-fpm simply does not like connections
> keepalive/reuse? There may be a reason why we disablereuse by default
> in mod_proxy_fcgi...
> Where do communications stop working at 60%, something in the error
> log? Possibly some tcpdump/wireshark between proxy_fcgi and the
> backend could help here too.
>


I did some testing with different httpd/php-fpm configurations and I
noticed something interesting. I removed mod_http2 and used a static/fixed
number of php-fpm workers to simplify the picture, and I was able to
reproduce the issue way before reaching the 60% limit using http/1.1. In
the logs I can clearly see that one httpd process/thread stops logging
after sending the environment variables, and gdb shows this:

Thread 2 (Thread 0x7f99ecf19700 (LWP 21755)):
#0  0x00007f99f60b76ad in poll () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007f99f6e024f4 in apr_poll (aprset=aprset@entry=0x7f99ecf16a00,
    num=num@entry=1, nsds=nsds@entry=0x7f99ecf169b4, timeout=<optimized
out>)
    at poll/unix/poll.c:120
#2  0x00007f99f0b45893 in dispatch (conn=conn@entry=0x56217d285a60,
    conf=conf@entry=0x56217d1c6800, r=r@entry=0x7f99d8007820,
    setaside_pool=0x7f99d800d898, err=err@entry=0x7f99ecf18b40,
    bad_request=bad_request@entry=0x7f99ecf18ae4,
has_responded=0x7f99ecf18ae8,
    request_id=1) at mod_proxy_fcgi.c:563
#3  0x00007f99f0b46e32 in fcgi_do_request (p=<optimized out>, origin=0x0,
    uri=<optimized out>, url=<optimized out>, server_portstr=0x7f99ecf18b20
":8443",
    conf=0x56217d1c6800, conn=0x56217d285a60, r=0x7f99d8007820) at
mod_proxy_fcgi.c:936
#4  proxy_fcgi_handler (r=0x7f99d8007820, worker=<optimized out>,
conf=<optimized out>,
    url=<optimized out>, proxyname=<optimized out>, proxyport=<optimized
out>)
    at mod_proxy_fcgi.c:1050
[..]

And all the php-fpm workers this (sadly without debug symbols):

Thread 1 (Thread 0x7fe67a0bcd40 (LWP 21882)):
#0  0x00007fe676b2a1f0 in __read_nocancel () at
../sysdeps/unix/syscall-template.S:84
#1  0x00000000007a9734 in ?? ()
#2  0x00000000007aad60 in fcgi_accept_request ()
#3  0x0000000000464424 in main ()

strace shows the php-fpm process blocked in read(4,

Interesting stats running h2load:

*   -n100000 -c10 -m10 https://etc..

With http/2 I can see the h2load reaching 100%, meanwhile with http/1.1 it
doesn't even reach 10% before blocking like described above.

*  -n100000 -c100 -m10 https://etc..

Problems showing up with http/2 too. In both case setting Timeout (or
ProxyTimeout) to a low value makes connections to get closed due to
inactivity and h2load slowly proceeds.

pstree of my testing environment:

|-httpd-+-httpd
|       `-10*[httpd---21*[{httpd}]]

|-php5-fpm---10*[php5-fpm]


The impression that I have is that somehow mod_proxy_fcgi sends a FCGI
request over a (reused) unix socket's fd and waits polling the fcgi's
backend response, that in the meantime is doing something else. I am going
to do more tests in the following days, but if anybody has a clear idea
about what's happening please let me know :)

Thanks!

Luca