You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by "William A. Rowe, Jr." <wr...@rowe-clan.net> on 2001/07/19 20:39:07 UTC
Re: Tag 2.0.21 was Re: daedalus is back on 2.0.21-dev
From: "Justin Erenkrantz" <je...@ebuilt.com>
Sent: Thursday, July 19, 2001 1:06 PM
> I wouldn't recommend using the threaded code at all because we are still
> doing a per-process allocation mutex which causes threaded to become
> useless. When that is changed (i.e. we enable SMS), I think that
> threaded MPM will deserve to be beat up and tested. -- justin
Tag and roll today, and enable SMS. This is now a bottleneck, and no doubt
SMS will _significantly_ help us out with the threading/locking performance
issues. The tree is stable so let users bang on it, but lets get SMS turned
on sooner rather than later, since the longer it is in use, the more quickly
bugs will be fleshed out.
Re: Pool allocation bottlenecks Re: Tag 2.0.21 was Re: daedalus is back on 2.0.21-dev
Posted by Justin Erenkrantz <je...@ebuilt.com>.
[ Dropping new-httpd ]
> For a test using server-parsed requests, the pattern is very different:
> 0.00 0.00 87710/14587902 apr_file_read [9]
> 0.00 0.00 3000048/14587902 apr_pool_destroy
> <cycle 5> [22]
> 0.00 0.00 3000074/14587902 apr_pool_sub_make [31]
> 0.00 0.00 4000049/14587902 free_blocks [28]
> 0.00 0.00 4500021/14587902 apr_palloc [27]
> [13] 25.0 0.00 0.01 14587902 apr_lock_acquire [13]
The pool_destroy and sub_make code shouldn't need to acquire a lock to
do destruction - rather they may be present, but the scope of the
locks will now be thread-local - so there should be very little (if
any) contention on the locks. At least that's the intention.
-- justin
Pool allocation bottlenecks Re: Tag 2.0.21 was Re: daedalus is back
on 2.0.21-dev
Posted by Brian Pane <bp...@pacbell.net>.
William A. Rowe, Jr. wrote:
>From: "Justin Erenkrantz" <je...@ebuilt.com>
>Sent: Thursday, July 19, 2001 1:06 PM
>
>
>>I wouldn't recommend using the threaded code at all because we are still
>>doing a per-process allocation mutex which causes threaded to become
>>useless. When that is changed (i.e. we enable SMS), I think that
>>threaded MPM will deserve to be beat up and tested. -- justin
>>
>
>Tag and roll today, and enable SMS. This is now a bottleneck, and no doubt
>SMS will _significantly_ help us out with the threading/locking performance
>issues.
>
It's worth noting that, for non-server-parsed content, apr_palloc
(in the original, non-SMS implementation) doesn't actually have to
acquire a lock very often. From gprof,
0.00 0.00 994/1588875 apr_palloc [31]
0.00 0.00 87710/1588875 apr_file_read [5]
0.00 0.00 500048/1588875 apr_pool_destroy <cycle
5> [145]
0.00 0.00 500049/1588875 free_blocks [91]
0.00 0.00 500074/1588875 apr_pool_sub_make [143]
[87] 0.0 0.00 0.00 1588875 apr_lock_acquire [87]
The numbers mean that, out of 1,588,875 calls to apr_lock_aquire,
994 of them were from apr_palloc.
For a test using server-parsed requests, the pattern is very different:
0.00 0.00 87710/14587902 apr_file_read [9]
0.00 0.00 3000048/14587902 apr_pool_destroy
<cycle 5> [22]
0.00 0.00 3000074/14587902 apr_pool_sub_make [31]
0.00 0.00 4000049/14587902 free_blocks [28]
0.00 0.00 4500021/14587902 apr_palloc [27]
[13] 25.0 0.00 0.01 14587902 apr_lock_acquire [13]
Here, apr_palloc is doing a lot of locking, so thread-specific, lock-free
source of additional blocks for an SMS will help a lot.
Some thoughts based on the numbers:
* For anybody working on tuning the SMS implementation, I highly
recommend incorporating mod_include into your test cases.
* Creating and destroying pools is the major bottleneck for
non-server-parsed requests. In order to achieve big speedups
in the httpd, the SMS implementation needs to make sub-pool
creation and destruction faster than the original pool design.
* In the non-server-parsed case, apr_palloc is one of the most
time-consuming functions in the httpd. Keep in mind that it
almost never (in this test case) has to acquire a lock and
call new_block; instead, it's usually taking the fast path
through the code that requires just a few arithmetic and pointer
operations. While it's probably possible to tune the code
a bit, it's arguably close to optimal already. What this
means to me is that the real optimization opportunity for
non-server-parsed content is not to make apr_palloc faster,
but rather to stop calling apr_palloc so much.
--Brian
Pool allocation bottlenecks Re: Tag 2.0.21 was Re: daedalus is back
on 2.0.21-dev
Posted by Brian Pane <bp...@pacbell.net>.
William A. Rowe, Jr. wrote:
>From: "Justin Erenkrantz" <je...@ebuilt.com>
>Sent: Thursday, July 19, 2001 1:06 PM
>
>
>>I wouldn't recommend using the threaded code at all because we are still
>>doing a per-process allocation mutex which causes threaded to become
>>useless. When that is changed (i.e. we enable SMS), I think that
>>threaded MPM will deserve to be beat up and tested. -- justin
>>
>
>Tag and roll today, and enable SMS. This is now a bottleneck, and no doubt
>SMS will _significantly_ help us out with the threading/locking performance
>issues.
>
It's worth noting that, for non-server-parsed content, apr_palloc
(in the original, non-SMS implementation) doesn't actually have to
acquire a lock very often. From gprof,
0.00 0.00 994/1588875 apr_palloc [31]
0.00 0.00 87710/1588875 apr_file_read [5]
0.00 0.00 500048/1588875 apr_pool_destroy <cycle
5> [145]
0.00 0.00 500049/1588875 free_blocks [91]
0.00 0.00 500074/1588875 apr_pool_sub_make [143]
[87] 0.0 0.00 0.00 1588875 apr_lock_acquire [87]
The numbers mean that, out of 1,588,875 calls to apr_lock_aquire,
994 of them were from apr_palloc.
For a test using server-parsed requests, the pattern is very different:
0.00 0.00 87710/14587902 apr_file_read [9]
0.00 0.00 3000048/14587902 apr_pool_destroy
<cycle 5> [22]
0.00 0.00 3000074/14587902 apr_pool_sub_make [31]
0.00 0.00 4000049/14587902 free_blocks [28]
0.00 0.00 4500021/14587902 apr_palloc [27]
[13] 25.0 0.00 0.01 14587902 apr_lock_acquire [13]
Here, apr_palloc is doing a lot of locking, so thread-specific, lock-free
source of additional blocks for an SMS will help a lot.
Some thoughts based on the numbers:
* For anybody working on tuning the SMS implementation, I highly
recommend incorporating mod_include into your test cases.
* Creating and destroying pools is the major bottleneck for
non-server-parsed requests. In order to achieve big speedups
in the httpd, the SMS implementation needs to make sub-pool
creation and destruction faster than the original pool design.
* In the non-server-parsed case, apr_palloc is one of the most
time-consuming functions in the httpd. Keep in mind that it
almost never (in this test case) has to acquire a lock and
call new_block; instead, it's usually taking the fast path
through the code that requires just a few arithmetic and pointer
operations. While it's probably possible to tune the code
a bit, it's arguably close to optimal already. What this
means to me is that the real optimization opportunity for
non-server-parsed content is not to make apr_palloc faster,
but rather to stop calling apr_palloc so much.
--Brian
Re: Tag 2.0.21 was Re: daedalus is back on 2.0.21-dev
Posted by Justin Erenkrantz <je...@ebuilt.com>.
On Thu, Jul 19, 2001 at 01:39:07PM -0500, William A. Rowe, Jr. wrote:
> Tag and roll today, and enable SMS. This is now a bottleneck, and no doubt
> SMS will _significantly_ help us out with the threading/locking performance
> issues. The tree is stable so let users bang on it, but lets get SMS turned
> on sooner rather than later, since the longer it is in use, the more quickly
> bugs will be fleshed out.
+1. Time to go bug-hunting. -- justin
Re: Tag 2.0.21 was Re: daedalus is back on 2.0.21-dev
Posted by Justin Erenkrantz <je...@ebuilt.com>.
On Thu, Jul 19, 2001 at 01:39:07PM -0500, William A. Rowe, Jr. wrote:
> Tag and roll today, and enable SMS. This is now a bottleneck, and no doubt
> SMS will _significantly_ help us out with the threading/locking performance
> issues. The tree is stable so let users bang on it, but lets get SMS turned
> on sooner rather than later, since the longer it is in use, the more quickly
> bugs will be fleshed out.
+1. Time to go bug-hunting. -- justin
Re: Tag 2.0.21 was Re: daedalus is back on 2.0.21-dev
Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
From: "William A. Rowe, Jr." <wr...@rowe-clan.net>
Sent: Thursday, July 19, 2001 1:39 PM
> From: "Justin Erenkrantz" <je...@ebuilt.com>
> Sent: Thursday, July 19, 2001 1:06 PM
>
>
> > I wouldn't recommend using the threaded code at all because we are still
> > doing a per-process allocation mutex which causes threaded to become
> > useless. When that is changed (i.e. we enable SMS), I think that
> > threaded MPM will deserve to be beat up and tested. -- justin
>
> Tag and roll today, and enable SMS.
make that ... "and _then_ enable SMS"
Re: Tag 2.0.21 was Re: daedalus is back on 2.0.21-dev
Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
From: "William A. Rowe, Jr." <wr...@rowe-clan.net>
Sent: Thursday, July 19, 2001 1:39 PM
> From: "Justin Erenkrantz" <je...@ebuilt.com>
> Sent: Thursday, July 19, 2001 1:06 PM
>
>
> > I wouldn't recommend using the threaded code at all because we are still
> > doing a per-process allocation mutex which causes threaded to become
> > useless. When that is changed (i.e. we enable SMS), I think that
> > threaded MPM will deserve to be beat up and tested. -- justin
>
> Tag and roll today, and enable SMS.
make that ... "and _then_ enable SMS"