You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Yehezkel Horowitz <ho...@checkpoint.com> on 2015/10/19 17:13:32 UTC

Improve Apache performance on high load (prefork MPM) with multiple Accept mutexes (Patch attached)

Hello Apache gurus.

I was working on a project which used Apache 2.2.x with prefork MPM (using flock as mutex method) on Linux machine (with 20 cores), and run into the following problem.

During load, when number of Apache child processes get beyond some point (~3000 processes) - Apache didn't accept the incoming connections in reasonable time (seen in netstat as SYN_RECV).

I found a document about Apache Performance Tuning [1], in which there is an idea to improve the performance by:
"Another solution that has been considered but never implemented is to partially serialize the loop -- that is, let in a certain number of processes. This would only be of interest on multiprocessor boxes where it's possible that multiple children could run simultaneously, and the serialization actually doesn't take advantage of the full bandwidth. This is a possible area of future investigation, but priority remains low because highly parallel web servers are not the norm."

I wrote a small patch (aligned to 2.2.31) that implements this idea - create 4 mutexes and spread the child processes across the mutexes (by getpid() % mutex_number).

So in any given time - 4 ideal child processes are expected [2] to wait in the "select loop".
Once a new connection arrive - 4 processes are awake by the OS: 1 will succeed to accept the socket (and will release his mutex) and 3 will return to the "select loop".

This solved my specific problem and allowed me to get more load on the machine.

My questions to this forum are:


1.       Do you think this is a good implementation of the suggested idea?



2.       Any pitfalls I missed?


3.       Would you consider accepting this patch to the project?
If so, could you guide me what else needs to be done for acceptances?
I know there is a need for configuration & documentation work - I'll work on once the patch will be approved...


4.       Do you think '4' is a good default for the mutexes number? What should be the considerations to set the default?



5.       Does such implementation relevant for other MPMs (worker/event)?

Any other feedback is welcome.

[1] http://httpd.apache.org/docs/2.2/misc/perf-tuning.html, accept Serialization - Multiple Sockets section.
[2] There is no guarantee that exactly 4 processes will wait as all processes of "getpid() % mutex_number == 0" might be busy in a given time. But this sounds to me like a fair limitation.

Note: flock give me the best results, still it seems to be with n^2 complexity (where 'n' is the number of waiting processes), so reducing the number of processes waiting on each mutex give exponential improvement.

Regards,

Yehezkel Horowitz
Check Point Software Technologies Ltd.