You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by "William A. Rowe, Jr." <wr...@rowe-clan.net> on 2001/07/19 20:39:07 UTC

Re: Tag 2.0.21 was Re: daedalus is back on 2.0.21-dev

From: "Justin Erenkrantz" <je...@ebuilt.com>
Sent: Thursday, July 19, 2001 1:06 PM


> I wouldn't recommend using the threaded code at all because we are still
> doing a per-process allocation mutex which causes threaded to become
> useless.  When that is changed (i.e. we enable SMS), I think that 
> threaded MPM will deserve to be beat up and tested.  -- justin

Tag and roll today, and enable SMS.  This is now a bottleneck, and no doubt
SMS will _significantly_ help us out with the threading/locking performance
issues.  The tree is stable so let users bang on it, but lets get SMS turned
on sooner rather than later, since the longer it is in use, the more quickly
bugs will be fleshed out.


Re: Pool allocation bottlenecks Re: Tag 2.0.21 was Re: daedalus is back on 2.0.21-dev

Posted by Justin Erenkrantz <je...@ebuilt.com>.
[ Dropping new-httpd ]

> For a test using server-parsed requests, the pattern is very different:
>                 0.00    0.00   87710/14587902     apr_file_read [9]
>                 0.00    0.00 3000048/14587902     apr_pool_destroy 
> <cycle 5> [22]
>                 0.00    0.00 3000074/14587902     apr_pool_sub_make [31]
>                 0.00    0.00 4000049/14587902     free_blocks [28]
>                 0.00    0.00 4500021/14587902     apr_palloc [27]
> [13]    25.0    0.00    0.01 14587902         apr_lock_acquire [13]

The pool_destroy and sub_make code shouldn't need to acquire a lock to
do destruction - rather they may be present, but the scope of the
locks will now be thread-local - so there should be very little (if
any) contention on the locks.  At least that's the intention.  
-- justin


Pool allocation bottlenecks Re: Tag 2.0.21 was Re: daedalus is back on 2.0.21-dev

Posted by Brian Pane <bp...@pacbell.net>.
William A. Rowe, Jr. wrote:

>From: "Justin Erenkrantz" <je...@ebuilt.com>
>Sent: Thursday, July 19, 2001 1:06 PM
>
>
>>I wouldn't recommend using the threaded code at all because we are still
>>doing a per-process allocation mutex which causes threaded to become
>>useless.  When that is changed (i.e. we enable SMS), I think that 
>>threaded MPM will deserve to be beat up and tested.  -- justin
>>
>
>Tag and roll today, and enable SMS.  This is now a bottleneck, and no doubt
>SMS will _significantly_ help us out with the threading/locking performance
>issues.
>
It's worth noting that, for non-server-parsed content, apr_palloc
(in the original, non-SMS implementation) doesn't actually have to
acquire a lock very often.  From gprof,

                0.00    0.00     994/1588875     apr_palloc [31]
                0.00    0.00   87710/1588875     apr_file_read [5]
                0.00    0.00  500048/1588875     apr_pool_destroy <cycle 
5> [145]
                0.00    0.00  500049/1588875     free_blocks [91]
                0.00    0.00  500074/1588875     apr_pool_sub_make [143]
[87]     0.0    0.00    0.00 1588875         apr_lock_acquire [87]

The numbers mean that, out of 1,588,875 calls to apr_lock_aquire,
994 of them were from apr_palloc.

For a test using server-parsed requests, the pattern is very different:
                0.00    0.00   87710/14587902     apr_file_read [9]
                0.00    0.00 3000048/14587902     apr_pool_destroy 
<cycle 5> [22]
                0.00    0.00 3000074/14587902     apr_pool_sub_make [31]
                0.00    0.00 4000049/14587902     free_blocks [28]
                0.00    0.00 4500021/14587902     apr_palloc [27]
[13]    25.0    0.00    0.01 14587902         apr_lock_acquire [13]

Here, apr_palloc is doing a lot of locking, so thread-specific, lock-free
source of additional blocks for an SMS will help a lot.

Some thoughts based on the numbers:

  * For anybody working on tuning the SMS implementation, I highly
    recommend incorporating mod_include into your test cases.

  * Creating and destroying pools is the major bottleneck for
    non-server-parsed requests.  In order to achieve big speedups
    in the httpd, the SMS implementation needs to make sub-pool
    creation and destruction faster than the original pool design.

  * In the non-server-parsed case, apr_palloc is one of the most
    time-consuming functions in the httpd.  Keep in mind that it
    almost never (in this test case) has to acquire a lock and
    call new_block; instead, it's usually taking the fast path
    through the code that requires just a few arithmetic and pointer
    operations.  While it's probably possible to tune the code
    a bit, it's arguably close to optimal already.  What this
    means to me is that the real optimization opportunity for
    non-server-parsed content is not to make apr_palloc faster,
    but rather to stop calling apr_palloc so much.

--Brian




Pool allocation bottlenecks Re: Tag 2.0.21 was Re: daedalus is back on 2.0.21-dev

Posted by Brian Pane <bp...@pacbell.net>.
William A. Rowe, Jr. wrote:

>From: "Justin Erenkrantz" <je...@ebuilt.com>
>Sent: Thursday, July 19, 2001 1:06 PM
>
>
>>I wouldn't recommend using the threaded code at all because we are still
>>doing a per-process allocation mutex which causes threaded to become
>>useless.  When that is changed (i.e. we enable SMS), I think that 
>>threaded MPM will deserve to be beat up and tested.  -- justin
>>
>
>Tag and roll today, and enable SMS.  This is now a bottleneck, and no doubt
>SMS will _significantly_ help us out with the threading/locking performance
>issues.
>
It's worth noting that, for non-server-parsed content, apr_palloc
(in the original, non-SMS implementation) doesn't actually have to
acquire a lock very often.  From gprof,

                0.00    0.00     994/1588875     apr_palloc [31]
                0.00    0.00   87710/1588875     apr_file_read [5]
                0.00    0.00  500048/1588875     apr_pool_destroy <cycle 
5> [145]
                0.00    0.00  500049/1588875     free_blocks [91]
                0.00    0.00  500074/1588875     apr_pool_sub_make [143]
[87]     0.0    0.00    0.00 1588875         apr_lock_acquire [87]

The numbers mean that, out of 1,588,875 calls to apr_lock_aquire,
994 of them were from apr_palloc.

For a test using server-parsed requests, the pattern is very different:
                0.00    0.00   87710/14587902     apr_file_read [9]
                0.00    0.00 3000048/14587902     apr_pool_destroy 
<cycle 5> [22]
                0.00    0.00 3000074/14587902     apr_pool_sub_make [31]
                0.00    0.00 4000049/14587902     free_blocks [28]
                0.00    0.00 4500021/14587902     apr_palloc [27]
[13]    25.0    0.00    0.01 14587902         apr_lock_acquire [13]

Here, apr_palloc is doing a lot of locking, so thread-specific, lock-free
source of additional blocks for an SMS will help a lot.

Some thoughts based on the numbers:

  * For anybody working on tuning the SMS implementation, I highly
    recommend incorporating mod_include into your test cases.

  * Creating and destroying pools is the major bottleneck for
    non-server-parsed requests.  In order to achieve big speedups
    in the httpd, the SMS implementation needs to make sub-pool
    creation and destruction faster than the original pool design.

  * In the non-server-parsed case, apr_palloc is one of the most
    time-consuming functions in the httpd.  Keep in mind that it
    almost never (in this test case) has to acquire a lock and
    call new_block; instead, it's usually taking the fast path
    through the code that requires just a few arithmetic and pointer
    operations.  While it's probably possible to tune the code
    a bit, it's arguably close to optimal already.  What this
    means to me is that the real optimization opportunity for
    non-server-parsed content is not to make apr_palloc faster,
    but rather to stop calling apr_palloc so much.

--Brian




Re: Tag 2.0.21 was Re: daedalus is back on 2.0.21-dev

Posted by Justin Erenkrantz <je...@ebuilt.com>.
On Thu, Jul 19, 2001 at 01:39:07PM -0500, William A. Rowe, Jr. wrote:
> Tag and roll today, and enable SMS.  This is now a bottleneck, and no doubt
> SMS will _significantly_ help us out with the threading/locking performance
> issues.  The tree is stable so let users bang on it, but lets get SMS turned
> on sooner rather than later, since the longer it is in use, the more quickly
> bugs will be fleshed out.

+1.  Time to go bug-hunting.  -- justin


Re: Tag 2.0.21 was Re: daedalus is back on 2.0.21-dev

Posted by Justin Erenkrantz <je...@ebuilt.com>.
On Thu, Jul 19, 2001 at 01:39:07PM -0500, William A. Rowe, Jr. wrote:
> Tag and roll today, and enable SMS.  This is now a bottleneck, and no doubt
> SMS will _significantly_ help us out with the threading/locking performance
> issues.  The tree is stable so let users bang on it, but lets get SMS turned
> on sooner rather than later, since the longer it is in use, the more quickly
> bugs will be fleshed out.

+1.  Time to go bug-hunting.  -- justin


Re: Tag 2.0.21 was Re: daedalus is back on 2.0.21-dev

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
From: "William A. Rowe, Jr." <wr...@rowe-clan.net>
Sent: Thursday, July 19, 2001 1:39 PM


> From: "Justin Erenkrantz" <je...@ebuilt.com>
> Sent: Thursday, July 19, 2001 1:06 PM
> 
> 
> > I wouldn't recommend using the threaded code at all because we are still
> > doing a per-process allocation mutex which causes threaded to become
> > useless.  When that is changed (i.e. we enable SMS), I think that 
> > threaded MPM will deserve to be beat up and tested.  -- justin
> 
> Tag and roll today, and enable SMS.

make that ... "and _then_ enable SMS"


Re: Tag 2.0.21 was Re: daedalus is back on 2.0.21-dev

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
From: "William A. Rowe, Jr." <wr...@rowe-clan.net>
Sent: Thursday, July 19, 2001 1:39 PM


> From: "Justin Erenkrantz" <je...@ebuilt.com>
> Sent: Thursday, July 19, 2001 1:06 PM
> 
> 
> > I wouldn't recommend using the threaded code at all because we are still
> > doing a per-process allocation mutex which causes threaded to become
> > useless.  When that is changed (i.e. we enable SMS), I think that 
> > threaded MPM will deserve to be beat up and tested.  -- justin
> 
> Tag and roll today, and enable SMS.

make that ... "and _then_ enable SMS"