You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Brian Pane <br...@apache.org> on 2002/09/09 06:32:25 UTC
Performance profiling data for 2.0.41
I just ran some CPU profiling tests on 2.0.41 (Sander's "PRE1" tag).
For those working on performance tuning and/or mod_cache development,
the results show a few significant opportunities for improvement.
Test case: httpd-2.0.41 on Linux 2.4/x86
prefork MPM
one client (ab -c1) requesting the same 1KB file repeatedly
Instrumentation: oprofile-0.3 http://oprofile.sourceforge.net/
Top 20 functions without mod_mem_cache enabled:
(The numbers indicate the percentage of the httpd's total
usr-mode CPU time contributed by each function.)
8.4% memcpy
3.5% ap_directory_walk
2.6% apr_palloc
2.2% memset
2.0% config_log_transaction
1.8% setsockopt
1.7% apr_table_get
1.5% apr_brigade_puts
1.4% apr_filepath_merge
1.3% apr_table_setn
1.1% core_output_filter
1.0% pcre_exec
1.0% match_headers
0.9% ap_get_mime_headers_core
0.9% add_any_filter_handle
0.9% match
0.8% ap_read_request
0.8% merge_core_dir_configs
0.8% ap_http_header_filter
0.8% __offtime
Top 20 functions with mod_mem_cache enabled:
12.7% memcpy
3.4% apr_palloc
3.2% memset
2.9% config_log_transaction
2.5% apr_table_get
1.9% apr_brigade_puts
1.6% apr_pstrdup
1.5% apr_table_setn
1.4% core_output_filter
1.3% match_boyer_moore_horspool
1.3% MD5Transform
1.1% cache_url_handler
1.1% apr_brigade_write
1.0% add_any_filter_handle
1.0% apr_brigade_cleanup
0.9% ap_http_header_filter
0.9% pthread_setcanceltype
0.8% cache_hash
0.8% process_item
0.8% ap_rgetline_core
Notes:
* This profiler doesn't collect call chain information, so it's
not clear where all the memcpy calls are happening. But that's
probably something worth investigating, as memcpy is by far the
most expensive single operation.
* The time spent in each apr_palloc is usually very small--just
an if-statement and a bit of pointer arithmetic in the common
case. So its appearance so high on the list is due to the
frequency with which it's called.
* The memset calls are from calloc (in the cache code) and
apr_pcalloc (everywhere else).
* Based on the data, ap_directory_walk and config_log_transaction
could use some more optimization work. These functions have
been rising toward the top of the profile listings with each
2.0.x release--not because they've gotten slower, but because
other things have gotten faster.
* I've run out of optimization ideas for the apr_table functions.
* apr_brigade_puts gets called many times per request--at least
twice per line of the response header. I have some ideas for
fixing this by adding a writev-style functionality to the
brigade code.
* MD5Transform is used in generating cache keys. Is it possible
to use a cheaper hash function, or do we really need MD5 here?
Brian