You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Luca Toscano <to...@gmail.com> on 2016/04/09 12:26:56 UTC

Thundering herd and MPMs (for dummies)

Hi Apache devs,

as part of my documentation duties I would like to add some details about
how connections are accepted in Event (for
http://httpd.apache.org/docs/current/misc/perf-tuning.html and
https://httpd.apache.org/docs/2.4/mod/event.html). I am not super expert
with the MPMs code so what I am going to say could be terribly wrong,
please be patient :)

Everything started reading the documentation for SO_REUSEPORT, in which
event seems the only MPM non getting the same performance improvements as
worker/prefork. I tried to read the code to get a better idea about the
why, and I tried to compare the various MPMs.

My understanding is:

1) only one listening socket configured will leverage modern kernel
features (when available) avoiding the use of a process mutex before accept
(essentially serializing the accept calls).
2) multiple listening sockets configured needs some sort of control since
processes/threads needs to know what socket is ready for accept (or other
events of course). APR offers the apr_pollset_* functions to solve this
problem, for example using select/[e]poll over multiple listening sockets.
The thundering herd problem arises when select/[e]poll is used by all the
processes/threads over the same listening sockets, because each event wakes
all up at the same time causing extra kernel work (and cpu utilization).

SAFE_ACCEPT() is usually implemented in the various MPMs to solve 2)
essentially serializing select/[e]poll/accept calls with a mutex (like SysV
semaphores). Prefork has to do this in all its processes because there is a
1:1 correspondence between process and connection served, meanwhile Worker
is a bit smarter and delegates only one thread per process to the role of
"listener", assigning the accepted connection/fd to the first worker thread
available.

I was a bit puzzled not finding any SAFE_ACCEPT in event, but eventually
(and thanks to the #dev IRC channel) I think I know why: Event uses more
recent epolls/kqueue/etc.. when available and sets the pollset to
APR_SO_NONBLOCK; apr_pollset_poll is configured to just wait a little
timeout to catch events, returning (and not blocking) otherwise. The only
lock used is thread based (one per process) since the pollset is updated by
the worker threads periodically (Keep alive, lingering close, etc. forces
the worker to give back control of the fd to the listener to be free to do
something else). The thundering herd issue in event is not really a major
problem since it has been mitigated using a combination of timeouts and non
blocking I/O in the listeners (plus the listeners are usually few compared
to the total number of worker threads).

Last but not the least,
https://httpd.apache.org/docs/current/mod/core.html#mutex is therefore not
needed when using event.

Does this vaguely resemble reality? If not, is there anybody kind enough to
give me some direction about what to look/read? Documentation will be
updated in return, I promise :)

Thanks!

Regards,

Luca

Re: Thundering herd and MPMs (for dummies)

Posted by Luca Toscano <to...@gmail.com>.
2016-04-18 8:47 GMT+02:00 Luca Toscano <to...@gmail.com>:

> Hi Yann!
>
> 2016-04-16 14:20 GMT+02:00 Yann Ylavic <yl...@gmail.com>:
>
>> On Sat, Apr 16, 2016 at 2:17 PM, Yann Ylavic <yl...@gmail.com>
>> wrote:
>> > Hi Luca,
>> >
>> > On Sat, Apr 16, 2016 at 12:07 PM, Luca Toscano <to...@gmail.com>
>> wrote:
>> >> The sockets are non blocking and without any guard before the
>> >> apr_pollset_poll (between processes I mean) there might be the risk of
>> >> having two or more listener threads trying to accept the same new
>> >> connection, ending up in only one proceeding and the rest getting
>> EAGAIN.
>> >
>> > On modern systems, the thundering hurd is not an issue anymore (does
>> > not happen).
>> > There won't be multiple listeners (threads or processes) woken up at
>> > the same time for the same incoming connection when
>> > epoll_wait()/kevent()/... are used (see the corresponding man pages,
>> > EAGAIN is not a possible error, while it is for eg. poll()).
>> > So when accept() is called, we can be sure , so a fortiori for
>> epoll()+accept().
>>
>> [Sorry, unexpected send...]
>> So when accept() is called, we can be sure that a connection is available.
>>
>> >
>> > Since, as you noticed, mpm_event is meant for modern systems, not
>> > ACCEPT_MUTEX is implemented.
>>
>
> Thanks a lot for the answer, it makes more sense now but I still have some
> doubts. I'd have some questions for you :)
>
> My understanding is that each process/thread block initializes an
> event_pollset containing initially all the listening sockets (from Listen)
> and then later on all the ones related to keep alive / lingering close /
> etc.. sockets re-assigned to the listener by workers. Each process/threads
> block handles separate sockets except the listening ones that are "shared"
> (my understanding).
>
> Before sending the email I took a look to
> http://man7.org/linux/man-pages/man7/epoll.7.html and Q2/A2 (Q&A) states
> that the same socket "monitored" by different pollsets (or epoll instances,
> depending on the nomenclature) will get reported in each of them once an
> event is ready. EPOLLONESHOT seemed the only flag for epoll_ctl to use to
> avoid multiple threads/processes waking up at the same time, but I didn't
> find any trace of it in apr/httpd.
>
> So I am still super confused about how multiple listener threads
> (belonging to different processes and pollsets) won't be woken up at the
> same time by epoll_wait when a new connection lands to httpd. The
> explanation that I gave to myself was that with non blocking sockets and
> very few listeners the overhead of getting all of them to (try to) accept
> the same connection is not that heavy and could be acceptable performance
> wise (a simple EAGAIN returned by accept is not a big deal).
>
> I know that my understanding about epoll/httpd is really wrong but still
> not super convinced about where. If you still have patience (and time),
> would you mind to point me to a snippet of code that could solve my doubts?
> I am asking tons of questions because I'd like to write the most precise
> info in the docs without risking to confuse more readers like me (for
> example in
> http://httpd.apache.org/docs/current/misc/perf-tuning.html#runtime).
>
>
Tried to make some experiments with latest httpd 2.4.x code to better
understand the problem. I started with adding basic logging around the
accept() part of the event's listener thread:

Trivial patch, probably horribly written: http://apaste.info/rL5

Basic httpd.conf to start 6 processes with event, two listening port (80
and 8080). Tried to use curl and made request to localhost obtaining only
single accept attempts all the times, like:

[mpm_event:info] [pid 5975:tid 139917770266368] PT_ACCEPT after epoll_wait
[mpm_event:info] [pid 5975:tid 139917770266368] Accepting..
[mpm_event:info] [pid 5975:tid 139917770266368] Accept went fine!

I expected to see also failed accept attempt to validate my theory, but no
luck. So I started to strace the httpd processes (Linux, Debian) to get
more info:

strace -f $(pgrep httpd | sed -e 's/^/-p/g')

I saw regular calls to epoll_wait from all the listener threads as
expected, and I tried to make a HTTP request obtaining something
interesting:

http://apaste.info/oox

The only way that I can explain this behavior is due to strace "violent"
behavior while tracing a program, namely stopping it each time a system
call is invoked to log the event somewhere (and hence slowing everything
down a lot). Probably on busy servers different listeners are not in sync
while waiting for apr_pollset_poll events, so the multiple wake up issue is
not really a concern.

My point being: this is not a bad behavior but a very awesome trade-off to
avoid any accept mutex/serialization. It might be a very good information
for users to know, especially when comparing httpd with other solutions
(httpd is awesome). So if you like the idea, and if my understanding is
correct, I'll update the docs. Otherwise you are free to ban me from this
email list if you wish :)

Thanks again for the patience!

Luca

Re: Thundering herd and MPMs (for dummies)

Posted by Luca Toscano <to...@gmail.com>.
Hi Yann!

2016-04-16 14:20 GMT+02:00 Yann Ylavic <yl...@gmail.com>:

> On Sat, Apr 16, 2016 at 2:17 PM, Yann Ylavic <yl...@gmail.com> wrote:
> > Hi Luca,
> >
> > On Sat, Apr 16, 2016 at 12:07 PM, Luca Toscano <to...@gmail.com>
> wrote:
> >> The sockets are non blocking and without any guard before the
> >> apr_pollset_poll (between processes I mean) there might be the risk of
> >> having two or more listener threads trying to accept the same new
> >> connection, ending up in only one proceeding and the rest getting
> EAGAIN.
> >
> > On modern systems, the thundering hurd is not an issue anymore (does
> > not happen).
> > There won't be multiple listeners (threads or processes) woken up at
> > the same time for the same incoming connection when
> > epoll_wait()/kevent()/... are used (see the corresponding man pages,
> > EAGAIN is not a possible error, while it is for eg. poll()).
> > So when accept() is called, we can be sure , so a fortiori for
> epoll()+accept().
>
> [Sorry, unexpected send...]
> So when accept() is called, we can be sure that a connection is available.
>
> >
> > Since, as you noticed, mpm_event is meant for modern systems, not
> > ACCEPT_MUTEX is implemented.
>

Thanks a lot for the answer, it makes more sense now but I still have some
doubts. I'd have some questions for you :)

My understanding is that each process/thread block initializes an
event_pollset containing initially all the listening sockets (from Listen)
and then later on all the ones related to keep alive / lingering close /
etc.. sockets re-assigned to the listener by workers. Each process/threads
block handles separate sockets except the listening ones that are "shared"
(my understanding).

Before sending the email I took a look to
http://man7.org/linux/man-pages/man7/epoll.7.html and Q2/A2 (Q&A) states
that the same socket "monitored" by different pollsets (or epoll instances,
depending on the nomenclature) will get reported in each of them once an
event is ready. EPOLLONESHOT seemed the only flag for epoll_ctl to use to
avoid multiple threads/processes waking up at the same time, but I didn't
find any trace of it in apr/httpd.

So I am still super confused about how multiple listener threads (belonging
to different processes and pollsets) won't be woken up at the same time by
epoll_wait when a new connection lands to httpd. The explanation that I
gave to myself was that with non blocking sockets and very few listeners
the overhead of getting all of them to (try to) accept the same connection
is not that heavy and could be acceptable performance wise (a simple EAGAIN
returned by accept is not a big deal).

I know that my understanding about epoll/httpd is really wrong but still
not super convinced about where. If you still have patience (and time),
would you mind to point me to a snippet of code that could solve my doubts?
I am asking tons of questions because I'd like to write the most precise
info in the docs without risking to confuse more readers like me (for
example in
http://httpd.apache.org/docs/current/misc/perf-tuning.html#runtime).

Thanks a lot!

Regards,

Luca

Re: Thundering herd and MPMs (for dummies)

Posted by Yann Ylavic <yl...@gmail.com>.
On Sat, Apr 16, 2016 at 2:17 PM, Yann Ylavic <yl...@gmail.com> wrote:
> Hi Luca,
>
> On Sat, Apr 16, 2016 at 12:07 PM, Luca Toscano <to...@gmail.com> wrote:
>> The sockets are non blocking and without any guard before the
>> apr_pollset_poll (between processes I mean) there might be the risk of
>> having two or more listener threads trying to accept the same new
>> connection, ending up in only one proceeding and the rest getting EAGAIN.
>
> On modern systems, the thundering hurd is not an issue anymore (does
> not happen).
> There won't be multiple listeners (threads or processes) woken up at
> the same time for the same incoming connection when
> epoll_wait()/kevent()/... are used (see the corresponding man pages,
> EAGAIN is not a possible error, while it is for eg. poll()).
> So when accept() is called, we can be sure , so a fortiori for epoll()+accept().

[Sorry, unexpected send...]
So when accept() is called, we can be sure that a connection is available.

>
> Since, as you noticed, mpm_event is meant for modern systems, not
> ACCEPT_MUTEX is implemented.

Re: Thundering herd and MPMs (for dummies)

Posted by Yann Ylavic <yl...@gmail.com>.
Hi Luca,

On Sat, Apr 16, 2016 at 12:07 PM, Luca Toscano <to...@gmail.com> wrote:
> The sockets are non blocking and without any guard before the
> apr_pollset_poll (between processes I mean) there might be the risk of
> having two or more listener threads trying to accept the same new
> connection, ending up in only one proceeding and the rest getting EAGAIN.

On modern systems, the thundering hurd is not an issue anymore (does
not happen).
There won't be multiple listeners (threads or processes) woken up at
the same time for the same incoming connection when
epoll_wait()/kevent()/... are used (see the corresponding man pages,
EAGAIN is not a possible error, while it is for eg. poll()).
So when accept() is called, we can be sure , so a fortiori for epoll()+accept().

Since, as you noticed, mpm_event is meant for modern systems, not
ACCEPT_MUTEX is implemented.

Re: Thundering herd and MPMs (for dummies)

Posted by Luca Toscano <to...@gmail.com>.
2016-04-09 12:26 GMT+02:00 Luca Toscano <to...@gmail.com>:

> Hi Apache devs,
>
> as part of my documentation duties I would like to add some details about
> how connections are accepted in Event (for
> http://httpd.apache.org/docs/current/misc/perf-tuning.html and
> https://httpd.apache.org/docs/2.4/mod/event.html). I am not super expert
> with the MPMs code so what I am going to say could be terribly wrong,
> please be patient :)
>
> Everything started reading the documentation for SO_REUSEPORT, in which
> event seems the only MPM non getting the same performance improvements as
> worker/prefork. I tried to read the code to get a better idea about the
> why, and I tried to compare the various MPMs.
>
> My understanding is:
>
> 1) only one listening socket configured will leverage modern kernel
> features (when available) avoiding the use of a process mutex before accept
> (essentially serializing the accept calls).
> 2) multiple listening sockets configured needs some sort of control since
> processes/threads needs to know what socket is ready for accept (or other
> events of course). APR offers the apr_pollset_* functions to solve this
> problem, for example using select/[e]poll over multiple listening sockets.
> The thundering herd problem arises when select/[e]poll is used by all the
> processes/threads over the same listening sockets, because each event wakes
> all up at the same time causing extra kernel work (and cpu utilization).
>
> SAFE_ACCEPT() is usually implemented in the various MPMs to solve 2)
> essentially serializing select/[e]poll/accept calls with a mutex (like SysV
> semaphores). Prefork has to do this in all its processes because there is a
> 1:1 correspondence between process and connection served, meanwhile Worker
> is a bit smarter and delegates only one thread per process to the role of
> "listener", assigning the accepted connection/fd to the first worker thread
> available.
>
> I was a bit puzzled not finding any SAFE_ACCEPT in event, but eventually
> (and thanks to the #dev IRC channel) I think I know why: Event uses more
> recent epolls/kqueue/etc.. when available and sets the pollset to
> APR_SO_NONBLOCK; apr_pollset_poll is configured to just wait a little
> timeout to catch events, returning (and not blocking) otherwise. The only
> lock used is thread based (one per process) since the pollset is updated by
> the worker threads periodically (Keep alive, lingering close, etc. forces
> the worker to give back control of the fd to the listener to be free to do
> something else). The thundering herd issue in event is not really a major
> problem since it has been mitigated using a combination of timeouts and non
> blocking I/O in the listeners (plus the listeners are usually few compared
> to the total number of worker threads).
>
> Last but not the least,
> https://httpd.apache.org/docs/current/mod/core.html#mutex is therefore
> not needed when using event.
>
> Does this vaguely resemble reality? If not, is there anybody kind enough
> to give me some direction about what to look/read? Documentation will be
> updated in return, I promise :)
>

Sorry again me, still need to figure out if my thoughts are correct. The
only doubt that I have (if the precedent email is somehow correct is how
multiple listening sockets are managed by the listerner threads in event.
The sockets are non blocking and without any guard before the
apr_pollset_poll (between processes I mean) there might be the risk of
having two or more listener threads trying to accept the same new
connection, ending up in only one proceeding and the rest getting EAGAIN. I
tried to read the following piece of code multiple times but I still don't
have the full picture:

https://github.com/apache/httpd/blob/fcf259f90c120d6ad63b639be5f9f720c300b327/server/mpm/event/event.c#L1948

I am pretty sure that I am missing a lot of things, so any hint about where
to look would be really awesome. I'll try to add as much documentation as
possible in return :)

Thanks for the patience!

Luca