You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Bill Stoddard <bi...@wstoddard.com> on 2000/12/07 17:38:20 UTC

Some performance analysis of Apache 2.0 (Windows NT 4.0)

Not sure how the column formatting will come though... The interesting numbers
are the 3rd column (SelfTime) and 5th columns (Number of calls to the
function).  Basically two threads handled all the requests in this run (due to
the LIFO scheduling of threads dispatched off a Windows i/o completion port) .
Since my machine has a single CPU, I would expect large numbers in
GetQueuedCompletionStatus and WaitForSingleObject.  As I suspected, that
FindFirstFile call is killing us and I'd like to get rid of it (replace it
with a more efficient call as in alpha5).

I'm a bit suprised at the time spend in the byterange_filter and
content_length_filters. Each filter seems to be called twice per request (this
run shows 2534 requests).  As expected, the additional mallocs and frees
introduced with the filters is chewing up cycles. Not sure why we have more
frees than mallocs though. Odd...


Name                                         Hier Time     % Hier Time
Self Time     % Self Time     # Calls
GetQueuedCompletionStatus     9.91 s             37.6                 9.91 s
37.6             2534   kernel32.dll
WaitForSingleObject                 9.02 s             34.2
9.02 s             34.2             2535   kernel32.dll
FindFirstFileA                           2.13 s               8.1
2.13 s                8.1            10140   kernel32.dll
CreateFileW                             1.20 s                4.6
1.20 s               4.6              5070   kernel32.dll
FindClose                                 616.75ms           2.3
616.75ms         2.3            10140   kernel32.dll
free                                           454.90ms          1.7
454.90ms         1.7             55758   msvcrtd.dll
_ap_byterange_filter                 443.25ms         1.7
432.87ms           1.6             5069
_ap_content_length_filter          432.65ms         1.6
419.15ms           1.6             5069
TransmitFile                             410.14ms         1.6
410.14ms             1.6            2535   mswsock.dll
CloseHandle                            351.85ms         1.3
351.85ms             1.3            5069   kernel32.dll
strlen                                        299.56ms         1.1
299.56ms             1.1         420810   msvcrtd.dll
malloc                                      265.18ms         1.0
265.18ms             1.0         48163   msvcrtd.dll
_stricmp                                   244.27ms         0.9
244.27ms             0.9         375180   msvcrtd.dll
AcceptEx                                 141.15ms         0.5
141.15ms             0.5         2534   mswsock.dll
GetFileInformationByHandle     136.73ms         0.5                 136.73ms
0.5         2535   kernel32.dll


Bill


Re: Some performance analysis of Apache 2.0 (Windows NT 4.0)

Posted by Bill Stoddard <bi...@wstoddard.com>.
Bleh... Let's try something different...

Name                                      Self Time    #Calls

GetQueuedCompletionStatus     9.91 s        2534
WaitForSingleObject                 9.02 s        2535
FindFirstFileA                           2.13 s       10140
CreateFileW                             1.20 s         5070
 FindClose                                616.75ms   10140
free                                          454.90ms   55758
 _ap_byterange_filter                432.87ms    5069
 _ap_content_length_filter         419.15ms    5069
TransmitFile                              410.14ms    2535
CloseHandle                            351.85ms     5069
strlen                                        299.56ms   420810
malloc                                      265.18ms     48163
 _stricmp                                   244.27ms   375180
AcceptEx                                 141.15ms       2534
GetFileInformationByHandle     136.73ms       2535

----- Original Message -----
From: Bill Stoddard <bi...@wstoddard.com>
To: <ne...@apache.org>
Sent: Thursday, December 07, 2000 11:38 AM
Subject: Some performance analysis of Apache 2.0 (Windows NT 4.0)


> Not sure how the column formatting will come though... The interesting
numbers
> are the 3rd column (SelfTime) and 5th columns (Number of calls to the
> function).  Basically two threads handled all the requests in this run (due
to
> the LIFO scheduling of threads dispatched off a Windows i/o completion port)
.
> Since my machine has a single CPU, I would expect large numbers in
> GetQueuedCompletionStatus and WaitForSingleObject.  As I suspected, that
> FindFirstFile call is killing us and I'd like to get rid of it (replace it
> with a more efficient call as in alpha5).
>
> I'm a bit suprised at the time spend in the byterange_filter and
> content_length_filters. Each filter seems to be called twice per request
(this
> run shows 2534 requests).  As expected, the additional mallocs and frees
> introduced with the filters is chewing up cycles. Not sure why we have more
> frees than mallocs though. Odd...
>

<snip>

> Bill
>


Re: Some performance analysis of Apache 2.0 (Windows NT 4.0)

Posted by Greg Stein <gs...@lyra.org>.
Here is Bill's note with the table formatted normally. Let's all take a
moment to chastise Bill for a sucky MUA and/or not simply typing text into
the darn thing :-)

Cheers,
-g

On Thu, Dec 07, 2000 at 11:38:20AM -0500, Bill Stoddard wrote:
> Not sure how the column formatting will come though... The interesting numbers
> are the 3rd column (SelfTime) and 5th columns (Number of calls to the
> function).  Basically two threads handled all the requests in this run (due to
> the LIFO scheduling of threads dispatched off a Windows i/o completion port) .
> Since my machine has a single CPU, I would expect large numbers in
> GetQueuedCompletionStatus and WaitForSingleObject.  As I suspected, that
> FindFirstFile call is killing us and I'd like to get rid of it (replace it
> with a more efficient call as in alpha5).
> 
> I'm a bit suprised at the time spend in the byterange_filter and
> content_length_filters. Each filter seems to be called twice per request (this
> run shows 2534 requests).  As expected, the additional mallocs and frees
> introduced with the filters is chewing up cycles. Not sure why we have more
> frees than mallocs though. Odd...
> 
>                             Hier   %Hier  Self   %Self
> Name                        Time    Time  Time    Time  #Calls
> GetQueuedCompletionStatus  9.91 s   37.6 9.91 s   37.6   2534   kernel32.dll
> WaitForSingleObject        9.02 s   34.2 9.02 s   34.2   2535   kernel32.dll
> FindFirstFileA             2.13 s    8.1 2.13 s    8.1  10140   kernel32.dll
> CreateFileW                1.20 s    4.6 1.20 s    4.6   5070   kernel32.dll
> FindClose                  616.75ms  2.3 616.75ms  2.3  10140   kernel32.dll
> free                       454.90ms  1.7 454.90ms  1.7  55758   msvcrtd.dll
> _ap_byterange_filter       443.25ms  1.7 432.87ms  1.6   5069
> _ap_content_length_filter  432.65ms  1.6 419.15ms  1.6   5069
> TransmitFile               410.14ms  1.6 410.14ms  1.6   2535   mswsock.dll
> CloseHandle                351.85ms  1.3 351.85ms  1.3   5069   kernel32.dll
> strlen                     299.56ms  1.1 299.56ms  1.1 420810   msvcrtd.dll
> malloc                     265.18ms  1.0 265.18ms  1.0  48163   msvcrtd.dll
> _stricmp                   244.27ms  0.9 244.27ms  0.9 375180   msvcrtd.dll
> AcceptEx                   141.15ms  0.5 141.15ms  0.5   2534   mswsock.dll
> GetFileInformationByHandle 136.73ms  0.5 136.73ms  0.5   2535   kernel32.dll
> 
> 
> Bill

-- 
Greg Stein, http://www.lyra.org/

Re: Some performance analysis of Apache 2.0 (Windows NT 4.0)

Posted by rb...@covalent.net.
> Not sure how the column formatting will come though... The interesting numbers
> are the 3rd column (SelfTime) and 5th columns (Number of calls to the
> function).  Basically two threads handled all the requests in this run (due to
> the LIFO scheduling of threads dispatched off a Windows i/o completion port) .
> Since my machine has a single CPU, I would expect large numbers in
> GetQueuedCompletionStatus and WaitForSingleObject.  As I suspected, that
> FindFirstFile call is killing us and I'd like to get rid of it (replace it
> with a more efficient call as in alpha5).
> 
> I'm a bit suprised at the time spend in the byterange_filter and
> content_length_filters. Each filter seems to be called twice per request (this
> run shows 2534 requests).  As expected, the additional mallocs and frees
> introduced with the filters is chewing up cycles. Not sure why we have more
> frees than mallocs though. Odd...

The byte-range filter should only be called once per request, assuming
that it isn't a byte-range request.  The content-length filter is called
every time down the stack, because this is where we compute the number of
bytes sent out the network.

One of the problems, is that we still aren't sending the EOS bucket down
the stack with the rest of the data, even if the whole request is
contained in once bucket.  Well, some of the generators do, but not most
of them.  This means that we are forced to have at least two calls down
the filter stack.

The mallocs and frees will go away in time.  We still have a few fish to
fry before we really start to optimize this code IMHO, and there are a LOT
of places to do some optimizing.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------