You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by Brian Pane <bp...@pacbell.net> on 2001/07/08 03:25:50 UTC
Initial profiling data on the SMS pools implementation
I used gprof to do some performance profiling of an httpd built
using SMS-based pools.
Below is the list of the top 100 functions, ranked by CPU usage.
(The full output of gprof, showing what functions contribute the
most invocations of what other functions, is ~300KB, so I haven't
included it here. I'm happy to send a copy to anybody who's
interested in the details.)
Background information on how the data was collected:
1. Prefork MPM on linux
2. The httpd was configured with a single worker process.
It served 500,000 requests for a 10KB, non-server-parsed file.
3. The profile data includes server start-up, so a lot of config-time
functions rank prominently in the results. Item 26 in the list
is where the interesting part begins.
4. Because of the recursion implicit in some parts of Apache
(including the SMS framework), gprof reports on "cycles" as
part of its function measurements. I've eliminated these
from the summary data for clarity. That's why some numbers
in the 1-100 list are absent.
Here's my interpretation of the data:
1. Ignoring all the configuration-phase stuff, the sms allocation
functions are where most of the time is being spent. (Note:
this profile includes only usr time, not sys, and it doesn't
include libc.)
2. The httpd did about 120 pool allocation operations per request.
Apparently there's a lot of code in the httpd that assumes
apr_pstrdup is a really cheap function call. :-)
3. The original pool implementation spends much of its time in
mutex ops, but the SMS-based one doesn't. This looks like a
major win.
4. In 500,000 requests, there were 3,088 instances when
apr_sms_trivial_malloc called apr_sms_malloc to get more
memory from the parent SMS. (This isn't apparent in the
summary below, but it shows up in the detailed gprof output.)
This suggests that the initial memory allocation for the
per-request pool is typically sufficient to handle all
allocations requested from that pool during the request.
5. Putting all of these observations together, it looks like
much of the SMS allocation cost is contained within
apr_sms_trivial_malloc, rather than distributed throughout
the framework. If my conclusion here is correct, that's
good news, because it will make optimization a lot easier.
--Brian
%time time time num function
(self) (children) calls
-------------------------------------------------------
[1] 100.0 0.00 0.02 main [1]
[2] 57.5 0.01 0.00 2802 apr_file_gets [2]
[3] 57.5 0.00 0.01 2806 ap_cfg_getline [3]
[4] 57.5 0.00 0.01 2802 cfg_getstr [4]
[5] 50.0 0.01 0.00 587612 apr_file_read [5]
[6] 42.5 0.00 0.01 500000 ap_mpm_pod_check [6]
[7] 42.5 0.00 0.01 1 ap_mpm_run [7]
[8] 42.5 0.00 0.01 1 child_main [8]
[9] 42.5 0.00 0.01 1 make_child [9]
[10] 42.5 0.00 0.01 1 startup_children [10]
[11] 39.9 0.00 0.01 6 ap_build_config [11]
[12] 39.9 0.00 0.01 2 ap_read_config [12]
[13] 26.6 0.00 0.01 4 process_command_config [13]
[14] 17.6 0.00 0.00 2 ap_run_post_config [14]
[15] 17.6 0.00 0.00 2 mime_post_config [15]
[16] 13.3 0.00 0.00 2 ap_process_resource_config [16]
[18] 1.1 0.00 0.00 16 start_ifmod <cycle 1> [18]
[19] 1.1 0.00 0.00 10 ap_soak_end_container [19]
[20] 0.5 0.00 0.00 6 ap_build_cont_config <cycle
1> [20]
[21] 0.0 0.00 0.00 2 ap_process_config_tree [21]
[26] 0.0 0.00 0.00 60006704 apr_sms_trivial_malloc
<cycle 5> [26]
[27] 0.0 0.00 0.00 43507050 apr_sms_malloc <cycle 5> [27]
[28] 0.0 0.00 0.00 17500002 apr_table_get [28]
[29] 0.0 0.00 0.00 14001170 apr_sms_calloc [29]
[30] 0.0 0.00 0.00 14001169 apr_sms_default_calloc [30]
[31] 0.0 0.00 0.00 9500002 conv_10 [31]
[32] 0.0 0.00 0.00 7000000 heap_read [32]
[33] 0.0 0.00 0.00 7000000 process_item [33]
[34] 0.0 0.00 0.00 6501048 apr_array_push [34]
[35] 0.0 0.00 0.00 6500814 apr_pstrdup [35]
[36] 0.0 0.00 0.00 6000233 apr_pstrcat [36]
[37] 0.0 0.00 0.00 5000000 apr_brigade_puts [37]
[38] 0.0 0.00 0.00 5000000 apr_brigade_write [38]
[39] 0.0 0.00 0.00 5000000 check_brigade_flush [39]
[40] 0.0 0.00 0.00 4500344 make_array_core [40]
[41] 0.0 0.00 0.00 4500068 apr_sms_cleanup_register [41]
[42] 0.0 0.00 0.00 4500067 apr_pool_cleanup_register [42]
[43] 0.0 0.00 0.00 4500026 apr_sms_trivial_free [43]
[44] 0.0 0.00 0.00 4500003 apr_set_option [44]
[45] 0.0 0.00 0.00 4000003 apr_is_option_set [45]
[46] 0.0 0.00 0.00 4000002 apr_vformatter [46]
[47] 0.0 0.00 0.00 4000000 ap_os_is_path_absolute [47]
[48] 0.0 0.00 0.00 4000000 apr_bucket_shared_destroy [48]
[49] 0.0 0.00 0.00 4000000 form_header_field [49]
[50] 0.0 0.00 0.00 3500104 apr_table_make [50]
[51] 0.0 0.00 0.00 3500010 ap_regexec [51]
[52] 0.0 0.00 0.00 3500010 pcre_exec [52]
[53] 0.0 0.00 0.00 3500010 regexec [53]
[54] 0.0 0.00 0.00 3500000 ap_pass_brigade <cycle 3> [54]
[55] 0.0 0.00 0.00 3500000 apr_brigade_cleanup [55]
[56] 0.0 0.00 0.00 3500000 apr_brigade_create [56]
[57] 0.0 0.00 0.00 3500000 constant_item [57]
[58] 0.0 0.00 0.00 3500000 heap_destroy [58]
[59] 0.0 0.00 0.00 3000446 apr_table_setn [59]
[60] 0.0 0.00 0.00 3000000 add_any_filter [60]
[61] 0.0 0.00 0.00 3000000 ap_get_brigade <cycle 2> [61]
[62] 0.0 0.00 0.00 3000000 apr_array_append [62]
[63] 0.0 0.00 0.00 3000000 apr_array_cat [63]
[64] 0.0 0.00 0.00 3000000 apr_array_copy_hdr [64]
[65] 0.0 0.00 0.00 3000000 apr_table_addn [65]
[66] 0.0 0.00 0.00 3000000 copy_array_hdr_core [66]
[67] 0.0 0.00 0.00 2500050 apr_pstrndup [67]
[68] 0.0 0.00 0.00 2500003 apr_setsocketopt [68]
[69] 0.0 0.00 0.00 2500003 apr_time_now [69]
[70] 0.0 0.00 0.00 2500002 put_scoreboard_info [70]
[71] 0.0 0.00 0.00 2500000 ap_getline [71]
[72] 0.0 0.00 0.00 2500000 ap_http_filter <cycle 2> [72]
[73] 0.0 0.00 0.00 2500000 apr_bucket_shared_split [73]
[74] 0.0 0.00 0.00 2500000 apr_bucket_simple_copy [74]
[75] 0.0 0.00 0.00 2500000 apr_bucket_simple_split [75]
[76] 0.0 0.00 0.00 2500000 apr_psprintf [76]
[77] 0.0 0.00 0.00 2500000 apr_pvsprintf [77]
[78] 0.0 0.00 0.00 2500000 apr_sms_realloc [78]
[79] 0.0 0.00 0.00 2500000 apr_sms_trivial_realloc [79]
[80] 0.0 0.00 0.00 2500000 check_symlinks [80]
[81] 0.0 0.00 0.00 2000026 apr_sms_free [81]
[82] 0.0 0.00 0.00 2000011 apr_sms_cleanup_unregister [82]
[83] 0.0 0.00 0.00 2000002 ap_update_child_status [83]
[84] 0.0 0.00 0.00 2000000 ap_add_output_filter [84]
[85] 0.0 0.00 0.00 2000000 ap_make_dirstr_prefix [85]
[86] 0.0 0.00 0.00 2000000 ap_method_number_of [86]
[87] 0.0 0.00 0.00 2000000 apr_bucket_destroy_noop [87]
[88] 0.0 0.00 0.00 2000000 brigade_cleanup [88]
[89] 0.0 0.00 0.00 2000000 core_output_filter [89]
[90] 0.0 0.00 0.00 1500034 ap_getword [90]
[91] 0.0 0.00 0.00 1500030 apr_ansi_time_to_apr_time [91]
[92] 0.0 0.00 0.00 1500001 apr_snprintf [92]
[93] 0.0 0.00 0.00 1500000 ap_find_token [93]
[94] 0.0 0.00 0.00 1500000 ap_method_list_add [94]
[95] 0.0 0.00 0.00 1500000 ap_no2slash [95]
[96] 0.0 0.00 0.00 1500000 ap_strcasestr [96]
[97] 0.0 0.00 0.00 1500000 apr_brigade_destroy [97]
[98] 0.0 0.00 0.00 1500000 apr_bucket_shared_make [98]
[99] 0.0 0.00 0.00 1500000 conv_p2 [99]
[100] 0.0 0.00 0.00 1500000 explode_time [100]
Re: Initial profiling data on the SMS pools implementation
Posted by Justin Erenkrantz <je...@ebuilt.com>.
On Sat, Jul 07, 2001 at 06:25:50PM -0700, Brian Pane wrote:
> Here's my interpretation of the data:
> 1. Ignoring all the configuration-phase stuff, the sms allocation
> functions are where most of the time is being spent. (Note:
> this profile includes only usr time, not sys, and it doesn't
> include libc.)
> 2. The httpd did about 120 pool allocation operations per request.
> Apparently there's a lot of code in the httpd that assumes
> apr_pstrdup is a really cheap function call. :-)
> 3. The original pool implementation spends much of its time in
> mutex ops, but the SMS-based one doesn't. This looks like a
> major win.
> 4. In 500,000 requests, there were 3,088 instances when
> apr_sms_trivial_malloc called apr_sms_malloc to get more
> memory from the parent SMS. (This isn't apparent in the
> summary below, but it shows up in the detailed gprof output.)
> This suggests that the initial memory allocation for the
> per-request pool is typically sufficient to handle all
> allocations requested from that pool during the request.
>
> 5. Putting all of these observations together, it looks like
> much of the SMS allocation cost is contained within
> apr_sms_trivial_malloc, rather than distributed throughout
> the framework. If my conclusion here is correct, that's
> good news, because it will make optimization a lot easier.
Cool. Thanks for running this. =) BTW, was there a performance
penalty by using SMS? (I'm knee deep in other stuff right now, so I
don't have time to analyze the performance impact myself.)
I think as we get more experience we'll get even better. I personally
like that fact that we have decoupled the memory allocation code, so
this now enables us to play with other memory management techniques.
Once it settles down (and I have a solid benchmark program written),
I'll probably play with some different strategies. Trivial is very
similar to the current pool allocator, so I wonder if there is benefit
by using other allocation strategies.
And, if we arrange the SMS in proper order, threaded MPMs should benefit
tremendously because we reduce the need for mutex locking (which I think
is far and away where we would see improvement with this code). Sander
has a thread SMS (very similar to trivial), but I haven't had a chance
to look at it yet in detail. (I don't think he has committed it yet.)
-- justin
P.S. If you could email me a copy of the gprof output or post it on a
website somewhere, I'd like to take a look at it.
Re: Initial profiling data on the SMS pools implementation
Posted by Cliff Woolley <cl...@yahoo.com>.
On Sat, 7 Jul 2001, Brian Pane wrote:
> I used gprof to do some performance profiling of an httpd built
> using SMS-based pools.
>
> Below is the list of the top 100 functions, ranked by CPU usage.
> (The full output of gprof, showing what functions contribute the
> most invocations of what other functions, is ~300KB, so I haven't
> included it here. I'm happy to send a copy to anybody who's
> interested in the details.)
Cool... thanks. I don't suppose there's any way to attach gprof to httpd
AFTER it's started up, is there? I'd really love to see those numbers.
I'm trying to figure out why there were so many calls to apr_file_read()
and no apparent calls to sendfile() or mmap(). I realize that libc
functions were not included in the list, but I'd think we'd have seen the
related apr/apr-util/httpd functions in the list and many fewer calls to
apr_file_read().
I'm really interested in taking a look at the full data. Can you just
post it somewhere on the web and list the URL here?
Thanks,
Cliff
--------------------------------------------------------------
Cliff Woolley
cliffwoolley@yahoo.com
Charlottesville, VA