You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by Brian Pane <bp...@pacbell.net> on 2001/07/19 21:12:45 UTC
Pool allocation bottlenecks Re: Tag 2.0.21 was Re: daedalus is back
on 2.0.21-dev
William A. Rowe, Jr. wrote:
>From: "Justin Erenkrantz" <je...@ebuilt.com>
>Sent: Thursday, July 19, 2001 1:06 PM
>
>
>>I wouldn't recommend using the threaded code at all because we are still
>>doing a per-process allocation mutex which causes threaded to become
>>useless. When that is changed (i.e. we enable SMS), I think that
>>threaded MPM will deserve to be beat up and tested. -- justin
>>
>
>Tag and roll today, and enable SMS. This is now a bottleneck, and no doubt
>SMS will _significantly_ help us out with the threading/locking performance
>issues.
>
It's worth noting that, for non-server-parsed content, apr_palloc
(in the original, non-SMS implementation) doesn't actually have to
acquire a lock very often. From gprof,
0.00 0.00 994/1588875 apr_palloc [31]
0.00 0.00 87710/1588875 apr_file_read [5]
0.00 0.00 500048/1588875 apr_pool_destroy <cycle
5> [145]
0.00 0.00 500049/1588875 free_blocks [91]
0.00 0.00 500074/1588875 apr_pool_sub_make [143]
[87] 0.0 0.00 0.00 1588875 apr_lock_acquire [87]
The numbers mean that, out of 1,588,875 calls to apr_lock_aquire,
994 of them were from apr_palloc.
For a test using server-parsed requests, the pattern is very different:
0.00 0.00 87710/14587902 apr_file_read [9]
0.00 0.00 3000048/14587902 apr_pool_destroy
<cycle 5> [22]
0.00 0.00 3000074/14587902 apr_pool_sub_make [31]
0.00 0.00 4000049/14587902 free_blocks [28]
0.00 0.00 4500021/14587902 apr_palloc [27]
[13] 25.0 0.00 0.01 14587902 apr_lock_acquire [13]
Here, apr_palloc is doing a lot of locking, so thread-specific, lock-free
source of additional blocks for an SMS will help a lot.
Some thoughts based on the numbers:
* For anybody working on tuning the SMS implementation, I highly
recommend incorporating mod_include into your test cases.
* Creating and destroying pools is the major bottleneck for
non-server-parsed requests. In order to achieve big speedups
in the httpd, the SMS implementation needs to make sub-pool
creation and destruction faster than the original pool design.
* In the non-server-parsed case, apr_palloc is one of the most
time-consuming functions in the httpd. Keep in mind that it
almost never (in this test case) has to acquire a lock and
call new_block; instead, it's usually taking the fast path
through the code that requires just a few arithmetic and pointer
operations. While it's probably possible to tune the code
a bit, it's arguably close to optimal already. What this
means to me is that the real optimization opportunity for
non-server-parsed content is not to make apr_palloc faster,
but rather to stop calling apr_palloc so much.
--Brian
Re: Pool allocation bottlenecks Re: Tag 2.0.21 was Re: daedalus is back on 2.0.21-dev
Posted by Justin Erenkrantz <je...@ebuilt.com>.
[ Dropping new-httpd ]
> For a test using server-parsed requests, the pattern is very different:
> 0.00 0.00 87710/14587902 apr_file_read [9]
> 0.00 0.00 3000048/14587902 apr_pool_destroy
> <cycle 5> [22]
> 0.00 0.00 3000074/14587902 apr_pool_sub_make [31]
> 0.00 0.00 4000049/14587902 free_blocks [28]
> 0.00 0.00 4500021/14587902 apr_palloc [27]
> [13] 25.0 0.00 0.01 14587902 apr_lock_acquire [13]
The pool_destroy and sub_make code shouldn't need to acquire a lock to
do destruction - rather they may be present, but the scope of the
locks will now be thread-local - so there should be very little (if
any) contention on the locks. At least that's the intention.
-- justin