You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Bill Stoddard <bi...@wstoddard.com> on 2000/12/07 17:38:20 UTC
Some performance analysis of Apache 2.0 (Windows NT 4.0)
Not sure how the column formatting will come though... The interesting numbers
are the 3rd column (SelfTime) and 5th columns (Number of calls to the
function). Basically two threads handled all the requests in this run (due to
the LIFO scheduling of threads dispatched off a Windows i/o completion port) .
Since my machine has a single CPU, I would expect large numbers in
GetQueuedCompletionStatus and WaitForSingleObject. As I suspected, that
FindFirstFile call is killing us and I'd like to get rid of it (replace it
with a more efficient call as in alpha5).
I'm a bit suprised at the time spend in the byterange_filter and
content_length_filters. Each filter seems to be called twice per request (this
run shows 2534 requests). As expected, the additional mallocs and frees
introduced with the filters is chewing up cycles. Not sure why we have more
frees than mallocs though. Odd...
Name Hier Time % Hier Time
Self Time % Self Time # Calls
GetQueuedCompletionStatus 9.91 s 37.6 9.91 s
37.6 2534 kernel32.dll
WaitForSingleObject 9.02 s 34.2
9.02 s 34.2 2535 kernel32.dll
FindFirstFileA 2.13 s 8.1
2.13 s 8.1 10140 kernel32.dll
CreateFileW 1.20 s 4.6
1.20 s 4.6 5070 kernel32.dll
FindClose 616.75ms 2.3
616.75ms 2.3 10140 kernel32.dll
free 454.90ms 1.7
454.90ms 1.7 55758 msvcrtd.dll
_ap_byterange_filter 443.25ms 1.7
432.87ms 1.6 5069
_ap_content_length_filter 432.65ms 1.6
419.15ms 1.6 5069
TransmitFile 410.14ms 1.6
410.14ms 1.6 2535 mswsock.dll
CloseHandle 351.85ms 1.3
351.85ms 1.3 5069 kernel32.dll
strlen 299.56ms 1.1
299.56ms 1.1 420810 msvcrtd.dll
malloc 265.18ms 1.0
265.18ms 1.0 48163 msvcrtd.dll
_stricmp 244.27ms 0.9
244.27ms 0.9 375180 msvcrtd.dll
AcceptEx 141.15ms 0.5
141.15ms 0.5 2534 mswsock.dll
GetFileInformationByHandle 136.73ms 0.5 136.73ms
0.5 2535 kernel32.dll
Bill
Re: Some performance analysis of Apache 2.0 (Windows NT 4.0)
Posted by Bill Stoddard <bi...@wstoddard.com>.
Bleh... Let's try something different...
Name Self Time #Calls
GetQueuedCompletionStatus 9.91 s 2534
WaitForSingleObject 9.02 s 2535
FindFirstFileA 2.13 s 10140
CreateFileW 1.20 s 5070
FindClose 616.75ms 10140
free 454.90ms 55758
_ap_byterange_filter 432.87ms 5069
_ap_content_length_filter 419.15ms 5069
TransmitFile 410.14ms 2535
CloseHandle 351.85ms 5069
strlen 299.56ms 420810
malloc 265.18ms 48163
_stricmp 244.27ms 375180
AcceptEx 141.15ms 2534
GetFileInformationByHandle 136.73ms 2535
----- Original Message -----
From: Bill Stoddard <bi...@wstoddard.com>
To: <ne...@apache.org>
Sent: Thursday, December 07, 2000 11:38 AM
Subject: Some performance analysis of Apache 2.0 (Windows NT 4.0)
> Not sure how the column formatting will come though... The interesting
numbers
> are the 3rd column (SelfTime) and 5th columns (Number of calls to the
> function). Basically two threads handled all the requests in this run (due
to
> the LIFO scheduling of threads dispatched off a Windows i/o completion port)
.
> Since my machine has a single CPU, I would expect large numbers in
> GetQueuedCompletionStatus and WaitForSingleObject. As I suspected, that
> FindFirstFile call is killing us and I'd like to get rid of it (replace it
> with a more efficient call as in alpha5).
>
> I'm a bit suprised at the time spend in the byterange_filter and
> content_length_filters. Each filter seems to be called twice per request
(this
> run shows 2534 requests). As expected, the additional mallocs and frees
> introduced with the filters is chewing up cycles. Not sure why we have more
> frees than mallocs though. Odd...
>
<snip>
> Bill
>
Re: Some performance analysis of Apache 2.0 (Windows NT 4.0)
Posted by Greg Stein <gs...@lyra.org>.
Here is Bill's note with the table formatted normally. Let's all take a
moment to chastise Bill for a sucky MUA and/or not simply typing text into
the darn thing :-)
Cheers,
-g
On Thu, Dec 07, 2000 at 11:38:20AM -0500, Bill Stoddard wrote:
> Not sure how the column formatting will come though... The interesting numbers
> are the 3rd column (SelfTime) and 5th columns (Number of calls to the
> function). Basically two threads handled all the requests in this run (due to
> the LIFO scheduling of threads dispatched off a Windows i/o completion port) .
> Since my machine has a single CPU, I would expect large numbers in
> GetQueuedCompletionStatus and WaitForSingleObject. As I suspected, that
> FindFirstFile call is killing us and I'd like to get rid of it (replace it
> with a more efficient call as in alpha5).
>
> I'm a bit suprised at the time spend in the byterange_filter and
> content_length_filters. Each filter seems to be called twice per request (this
> run shows 2534 requests). As expected, the additional mallocs and frees
> introduced with the filters is chewing up cycles. Not sure why we have more
> frees than mallocs though. Odd...
>
> Hier %Hier Self %Self
> Name Time Time Time Time #Calls
> GetQueuedCompletionStatus 9.91 s 37.6 9.91 s 37.6 2534 kernel32.dll
> WaitForSingleObject 9.02 s 34.2 9.02 s 34.2 2535 kernel32.dll
> FindFirstFileA 2.13 s 8.1 2.13 s 8.1 10140 kernel32.dll
> CreateFileW 1.20 s 4.6 1.20 s 4.6 5070 kernel32.dll
> FindClose 616.75ms 2.3 616.75ms 2.3 10140 kernel32.dll
> free 454.90ms 1.7 454.90ms 1.7 55758 msvcrtd.dll
> _ap_byterange_filter 443.25ms 1.7 432.87ms 1.6 5069
> _ap_content_length_filter 432.65ms 1.6 419.15ms 1.6 5069
> TransmitFile 410.14ms 1.6 410.14ms 1.6 2535 mswsock.dll
> CloseHandle 351.85ms 1.3 351.85ms 1.3 5069 kernel32.dll
> strlen 299.56ms 1.1 299.56ms 1.1 420810 msvcrtd.dll
> malloc 265.18ms 1.0 265.18ms 1.0 48163 msvcrtd.dll
> _stricmp 244.27ms 0.9 244.27ms 0.9 375180 msvcrtd.dll
> AcceptEx 141.15ms 0.5 141.15ms 0.5 2534 mswsock.dll
> GetFileInformationByHandle 136.73ms 0.5 136.73ms 0.5 2535 kernel32.dll
>
>
> Bill
--
Greg Stein, http://www.lyra.org/
Re: Some performance analysis of Apache 2.0 (Windows NT 4.0)
Posted by rb...@covalent.net.
> Not sure how the column formatting will come though... The interesting numbers
> are the 3rd column (SelfTime) and 5th columns (Number of calls to the
> function). Basically two threads handled all the requests in this run (due to
> the LIFO scheduling of threads dispatched off a Windows i/o completion port) .
> Since my machine has a single CPU, I would expect large numbers in
> GetQueuedCompletionStatus and WaitForSingleObject. As I suspected, that
> FindFirstFile call is killing us and I'd like to get rid of it (replace it
> with a more efficient call as in alpha5).
>
> I'm a bit suprised at the time spend in the byterange_filter and
> content_length_filters. Each filter seems to be called twice per request (this
> run shows 2534 requests). As expected, the additional mallocs and frees
> introduced with the filters is chewing up cycles. Not sure why we have more
> frees than mallocs though. Odd...
The byte-range filter should only be called once per request, assuming
that it isn't a byte-range request. The content-length filter is called
every time down the stack, because this is where we compute the number of
bytes sent out the network.
One of the problems, is that we still aren't sending the EOS bucket down
the stack with the rest of the data, even if the whole request is
contained in once bucket. Well, some of the generators do, but not most
of them. This means that we are forced to have at least two calls down
the filter stack.
The mallocs and frees will go away in time. We still have a few fish to
fry before we really start to optimize this code IMHO, and there are a LOT
of places to do some optimizing.
Ryan
_______________________________________________________________________________
Ryan Bloom rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------