You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Bill Stoddard <st...@raleigh.ibm.com> on 2000/02/04 23:06:44 UTC

Some performance analysis

I've spent a bit of time analyzing Apache 2.0 performance on Windows. Here
is a where we are spending time (from most to least). The time under each
function is the time spent per call (in uS) and the weighted time (time per
call * number of times called per request).

AcceptEx     612.81ms 2.0 612.81ms 2.0 5591   mswsock.dll
    109uS( *1 = 109uS)

TransmitFile 411.61ms 1.3 411.61ms 1.3 5591   mswsock.dll
    73.6uS(*1 = 73.6uS)

strlen 193.00ms 0.6 193.00ms 0.6 290732   msvcrtd.dll
    0.6uS(*52 = 31.2uS)

WSARecv 165.87ms 0.5 165.87ms 0.5 5591   ws2_32.dll
    29.7uS(*1 = 29.7uS)

memcpy 141.06ms 0.5 141.06ms 0.5 206867   msvcrtd.dll
    0.68uS(*37 = 25.16uS)

strchr 110.01ms 0.4 110.01ms 0.4 167730   msvcrtd.dll
    0.66uS(*30 = 19.8uS)

sscanf 68.42ms 0.2 68.42ms 0.2 5591   msvcrtd.dll
    12.2uS (*1 = 12.2uS)

_stricmp 51.71ms 0.2 51.71ms 0.2 519963 stricmp.asm stricmp.obj aprlib.dll
intel
    0.099uS(* 93 = 9.2uS)

FileTimeToSystemTime 48.87ms 0.2 48.87ms 0.2 11182   kernel32.dll
    4.34uS(*2=8.74uS)

This is enough data to give you an initial feel for where we are spending
cycles. Apache on Linux is significantly faster (almost 100%) for small
numbers of concurrent clients (not tested with large number of concurrent
clients) with the simple braindead test I am running (fetching 500 byte file
with apachebench). Adding the file handle cache to NT brings it close to
what I'm seeing with Linux.  I suspect Linux's speed is coming from the
efficiency of it's network and file i/o system calls.

One other observation before I go. ap_rvputs is responsible for 32 calls to
memcpy and 32 calls to strlen per request. ap_rvputs is called 8 times per
request. Seems there is opportunity for some performance tweaking here.
When I get the Windows port reasonably stable, I'd like to start
invesitgating putting the SGI fast path patches in. Oh, BTW, logging is OFF
in my testing here.

Enjoy!

Bill

________________________________________________
Bill Stoddard stoddard@raleigh.ibm.com

Come to the first official Apache Software Foundation
Conference!  <http://ApacheCon.Com/>




Re: Some performance analysis

Posted by Jeff Trawick <tr...@ibm.net>.
>= half of the strings passed to ap_rvputs() in the 2.0
tree are constant strings, where the len could be evaluated
at compile time.  Looking at a few sample calls in http_protocol.c,
it seems that the lens of some of the other parms are already
computed (or sometimes thrown away, like the retcode from 
ap_snprintf() for a buffer passed to ap_rvputs()).  

Perhaps a new ap_rvputsl() would be appropriate for some callers, 
where there is a series of (stringptr,stringlen) pairs followed by 
NULL passed in instead of a series of stringptr followed by NULL?  
It would be not be so concise, but needless computation is ugly in 
its own way.

(Way) Too ugly, or worth playing with, at least for important
callers in http_protocol.c?

I guess strlen() and memcpy() only show up because you compiled
for debug or profiling???  I wonder how that changes the real
performance.  Besides the cost to get to the dll, the dll code
might not be as fast; presumably the optimizer could have used the 
"right" registers for strlen/memcpy if they had been inline in the 
caller.
-- 
Jeff Trawick | trawick@ibm.net | PGP public key at web site:
     http://www.geocities.com/SiliconValley/Park/9289/
          Born in Roswell... married an alien...