You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Brian Pane <br...@apache.org> on 2002/09/09 06:32:25 UTC

Performance profiling data for 2.0.41

I just ran some CPU profiling tests on 2.0.41 (Sander's "PRE1" tag).
For those working on performance tuning and/or mod_cache development,
the results show a few significant opportunities for improvement.

Test case:  httpd-2.0.41 on Linux 2.4/x86
            prefork MPM
            one client (ab -c1) requesting the same 1KB file repeatedly
Instrumentation: oprofile-0.3  http://oprofile.sourceforge.net/


Top 20 functions without mod_mem_cache enabled:
 (The numbers indicate the percentage of the httpd's total
 usr-mode CPU time contributed by each function.)

	8.4%	memcpy
	3.5%	ap_directory_walk
	2.6%	apr_palloc
	2.2%	memset
	2.0%	config_log_transaction

	1.8%	setsockopt
	1.7%	apr_table_get
	1.5%	apr_brigade_puts
	1.4%	apr_filepath_merge
	1.3%	apr_table_setn

	1.1%	core_output_filter
	1.0%	pcre_exec
	1.0%	match_headers
	0.9%	ap_get_mime_headers_core
	0.9%	add_any_filter_handle

	0.9%	match
	0.8%	ap_read_request
	0.8%	merge_core_dir_configs
	0.8%	ap_http_header_filter
	0.8%	__offtime


Top 20 functions with mod_mem_cache enabled:

	12.7%	memcpy
	3.4%	apr_palloc
	3.2%	memset
	2.9%	config_log_transaction
	2.5%	apr_table_get

	1.9%	apr_brigade_puts
	1.6%	apr_pstrdup
	1.5%	apr_table_setn
	1.4%	core_output_filter
	1.3%	match_boyer_moore_horspool

	1.3%	MD5Transform
	1.1%	cache_url_handler
	1.1%	apr_brigade_write
	1.0%	add_any_filter_handle
	1.0%	apr_brigade_cleanup

	0.9%	ap_http_header_filter
	0.9%	pthread_setcanceltype
	0.8%	cache_hash
	0.8%	process_item
	0.8%	ap_rgetline_core

Notes:

* This profiler doesn't collect call chain information, so it's
  not clear where all the memcpy calls are happening.  But that's
  probably something worth investigating, as memcpy is by far the
  most expensive single operation.

* The time spent in each apr_palloc is usually very small--just
  an if-statement and a bit of pointer arithmetic in the common
  case.  So its appearance so high on the list is due to the
  frequency with which it's called.

* The memset calls are from calloc (in the cache code) and
  apr_pcalloc (everywhere else).


* Based on the data, ap_directory_walk and config_log_transaction
  could use some more optimization work.  These functions have
  been rising toward the top of the profile listings with each
  2.0.x release--not because they've gotten slower, but because
  other things have gotten faster.

* I've run out of optimization ideas for the apr_table functions.

* apr_brigade_puts gets called many times per request--at least
  twice per line of the response header.  I have some ideas for
  fixing this by adding a writev-style functionality to the
  brigade code.

* MD5Transform is used in generating cache keys.  Is it possible
  to use a cheaper hash function, or do we really need MD5 here?


Brian