You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by Brian Pane <bp...@pacbell.net> on 2001/07/08 03:25:50 UTC

Initial profiling data on the SMS pools implementation

I used gprof to do some performance profiling of an httpd built
using SMS-based pools.

Below is the list of the top 100 functions, ranked by CPU usage.
(The full output of gprof, showing what functions contribute the
most invocations of what other functions, is ~300KB, so I haven't
included it here.  I'm happy to send a copy to anybody who's
interested in the details.)

Background information on how the data was collected:
1. Prefork MPM on linux
2. The httpd was configured with a single worker process.
   It served 500,000 requests for a 10KB, non-server-parsed file.
3. The profile data includes server start-up, so a lot of config-time
   functions rank prominently in the results.  Item 26 in the list
   is where the interesting part begins.
4. Because of the recursion implicit in some parts of Apache
   (including the SMS framework), gprof reports on "cycles" as
   part of its function measurements.  I've eliminated these
   from the summary data for clarity.  That's why some numbers
   in the 1-100 list are absent.

Here's my interpretation of the data:
1. Ignoring all the configuration-phase stuff, the sms allocation
   functions are where most of the time is being spent.  (Note:
   this profile includes only usr time, not sys, and it doesn't
   include libc.)
2. The httpd did about 120 pool allocation operations per request.
   Apparently there's a lot of code in the httpd that assumes
   apr_pstrdup is a really cheap function call. :-)
3. The original pool implementation spends much of its time in
   mutex ops, but the SMS-based one doesn't.  This looks like a
   major win.
4. In 500,000 requests, there were 3,088 instances when
   apr_sms_trivial_malloc called apr_sms_malloc to get more
   memory from the parent SMS.  (This isn't apparent in the
   summary below, but it shows up in the detailed gprof output.)
   This suggests that the initial memory allocation for the
   per-request pool is typically sufficient to handle all
   allocations requested from that pool during the request.

5. Putting all of these observations together, it looks like
   much of the SMS allocation cost is contained within
   apr_sms_trivial_malloc, rather than distributed throughout
   the framework.  If my conclusion here is correct, that's
   good news, because it will make optimization a lot easier.
--Brian

       %time    time    time     num         function
               (self) (children) calls
-------------------------------------------------------
[1]    100.0    0.00    0.02                 main [1]
[2]     57.5    0.01    0.00    2802         apr_file_gets [2]
[3]     57.5    0.00    0.01    2806         ap_cfg_getline [3]
[4]     57.5    0.00    0.01    2802         cfg_getstr [4]
[5]     50.0    0.01    0.00  587612         apr_file_read [5]
[6]     42.5    0.00    0.01  500000         ap_mpm_pod_check [6]
[7]     42.5    0.00    0.01       1         ap_mpm_run [7]
[8]     42.5    0.00    0.01       1         child_main [8]
[9]     42.5    0.00    0.01       1         make_child [9]
[10]    42.5    0.00    0.01       1         startup_children [10]
[11]    39.9    0.00    0.01       6         ap_build_config [11]
[12]    39.9    0.00    0.01       2         ap_read_config [12]
[13]    26.6    0.00    0.01       4         process_command_config [13]
[14]    17.6    0.00    0.00       2         ap_run_post_config [14]
[15]    17.6    0.00    0.00       2         mime_post_config [15]
[16]    13.3    0.00    0.00       2         ap_process_resource_config [16]
[18]     1.1    0.00    0.00      16         start_ifmod <cycle 1> [18]
[19]     1.1    0.00    0.00      10         ap_soak_end_container [19]
[20]     0.5    0.00    0.00       6         ap_build_cont_config <cycle 
1> [20]
[21]     0.0    0.00    0.00       2         ap_process_config_tree [21]
[26]     0.0    0.00    0.00 60006704         apr_sms_trivial_malloc 
<cycle 5> [26]
[27]     0.0    0.00    0.00 43507050         apr_sms_malloc <cycle 5> [27]
[28]     0.0    0.00    0.00 17500002         apr_table_get [28]
[29]     0.0    0.00    0.00 14001170         apr_sms_calloc [29]
[30]     0.0    0.00    0.00 14001169         apr_sms_default_calloc [30]
[31]     0.0    0.00    0.00 9500002         conv_10 [31]
[32]     0.0    0.00    0.00 7000000         heap_read [32]
[33]     0.0    0.00    0.00 7000000         process_item [33]
[34]     0.0    0.00    0.00 6501048         apr_array_push [34]
[35]     0.0    0.00    0.00 6500814         apr_pstrdup [35]
[36]     0.0    0.00    0.00 6000233         apr_pstrcat [36]
[37]     0.0    0.00    0.00 5000000         apr_brigade_puts [37]
[38]     0.0    0.00    0.00 5000000         apr_brigade_write [38]
[39]     0.0    0.00    0.00 5000000         check_brigade_flush [39]
[40]     0.0    0.00    0.00 4500344         make_array_core [40]
[41]     0.0    0.00    0.00 4500068         apr_sms_cleanup_register [41]
[42]     0.0    0.00    0.00 4500067         apr_pool_cleanup_register [42]
[43]     0.0    0.00    0.00 4500026         apr_sms_trivial_free [43]
[44]     0.0    0.00    0.00 4500003         apr_set_option [44]
[45]     0.0    0.00    0.00 4000003         apr_is_option_set [45]
[46]     0.0    0.00    0.00 4000002         apr_vformatter [46]
[47]     0.0    0.00    0.00 4000000         ap_os_is_path_absolute [47]
[48]     0.0    0.00    0.00 4000000         apr_bucket_shared_destroy [48]
[49]     0.0    0.00    0.00 4000000         form_header_field [49]
[50]     0.0    0.00    0.00 3500104         apr_table_make [50]
[51]     0.0    0.00    0.00 3500010         ap_regexec [51]
[52]     0.0    0.00    0.00 3500010         pcre_exec [52]
[53]     0.0    0.00    0.00 3500010         regexec [53]
[54]     0.0    0.00    0.00 3500000         ap_pass_brigade <cycle 3> [54]
[55]     0.0    0.00    0.00 3500000         apr_brigade_cleanup [55]
[56]     0.0    0.00    0.00 3500000         apr_brigade_create [56]
[57]     0.0    0.00    0.00 3500000         constant_item [57]
[58]     0.0    0.00    0.00 3500000         heap_destroy [58]
[59]     0.0    0.00    0.00 3000446         apr_table_setn [59]
[60]     0.0    0.00    0.00 3000000         add_any_filter [60]
[61]     0.0    0.00    0.00 3000000         ap_get_brigade <cycle 2> [61]
[62]     0.0    0.00    0.00 3000000         apr_array_append [62]
[63]     0.0    0.00    0.00 3000000         apr_array_cat [63]
[64]     0.0    0.00    0.00 3000000         apr_array_copy_hdr [64]
[65]     0.0    0.00    0.00 3000000         apr_table_addn [65]
[66]     0.0    0.00    0.00 3000000         copy_array_hdr_core [66]
[67]     0.0    0.00    0.00 2500050         apr_pstrndup [67]
[68]     0.0    0.00    0.00 2500003         apr_setsocketopt [68]
[69]     0.0    0.00    0.00 2500003         apr_time_now [69]
[70]     0.0    0.00    0.00 2500002         put_scoreboard_info [70]
[71]     0.0    0.00    0.00 2500000         ap_getline [71]
[72]     0.0    0.00    0.00 2500000         ap_http_filter <cycle 2> [72]
[73]     0.0    0.00    0.00 2500000         apr_bucket_shared_split [73]
[74]     0.0    0.00    0.00 2500000         apr_bucket_simple_copy [74]
[75]     0.0    0.00    0.00 2500000         apr_bucket_simple_split [75]
[76]     0.0    0.00    0.00 2500000         apr_psprintf [76]
[77]     0.0    0.00    0.00 2500000         apr_pvsprintf [77]
[78]     0.0    0.00    0.00 2500000         apr_sms_realloc [78]
[79]     0.0    0.00    0.00 2500000         apr_sms_trivial_realloc [79]
[80]     0.0    0.00    0.00 2500000         check_symlinks [80]
[81]     0.0    0.00    0.00 2000026         apr_sms_free [81]
[82]     0.0    0.00    0.00 2000011         apr_sms_cleanup_unregister [82]
[83]     0.0    0.00    0.00 2000002         ap_update_child_status [83]
[84]     0.0    0.00    0.00 2000000         ap_add_output_filter [84]
[85]     0.0    0.00    0.00 2000000         ap_make_dirstr_prefix [85]
[86]     0.0    0.00    0.00 2000000         ap_method_number_of [86]
[87]     0.0    0.00    0.00 2000000         apr_bucket_destroy_noop [87]
[88]     0.0    0.00    0.00 2000000         brigade_cleanup [88]
[89]     0.0    0.00    0.00 2000000         core_output_filter [89]
[90]     0.0    0.00    0.00 1500034         ap_getword [90]
[91]     0.0    0.00    0.00 1500030         apr_ansi_time_to_apr_time [91]
[92]     0.0    0.00    0.00 1500001         apr_snprintf [92]
[93]     0.0    0.00    0.00 1500000         ap_find_token [93]
[94]     0.0    0.00    0.00 1500000         ap_method_list_add [94]
[95]     0.0    0.00    0.00 1500000         ap_no2slash [95]
[96]     0.0    0.00    0.00 1500000         ap_strcasestr [96]
[97]     0.0    0.00    0.00 1500000         apr_brigade_destroy [97]
[98]     0.0    0.00    0.00 1500000         apr_bucket_shared_make [98]
[99]     0.0    0.00    0.00 1500000         conv_p2 [99]
[100]    0.0    0.00    0.00 1500000         explode_time [100]



Re: Initial profiling data on the SMS pools implementation

Posted by Justin Erenkrantz <je...@ebuilt.com>.
On Sat, Jul 07, 2001 at 06:25:50PM -0700, Brian Pane wrote:
> Here's my interpretation of the data:
> 1. Ignoring all the configuration-phase stuff, the sms allocation
>    functions are where most of the time is being spent.  (Note:
>    this profile includes only usr time, not sys, and it doesn't
>    include libc.)
> 2. The httpd did about 120 pool allocation operations per request.
>    Apparently there's a lot of code in the httpd that assumes
>    apr_pstrdup is a really cheap function call. :-)
> 3. The original pool implementation spends much of its time in
>    mutex ops, but the SMS-based one doesn't.  This looks like a
>    major win.
> 4. In 500,000 requests, there were 3,088 instances when
>    apr_sms_trivial_malloc called apr_sms_malloc to get more
>    memory from the parent SMS.  (This isn't apparent in the
>    summary below, but it shows up in the detailed gprof output.)
>    This suggests that the initial memory allocation for the
>    per-request pool is typically sufficient to handle all
>    allocations requested from that pool during the request.
> 
> 5. Putting all of these observations together, it looks like
>    much of the SMS allocation cost is contained within
>    apr_sms_trivial_malloc, rather than distributed throughout
>    the framework.  If my conclusion here is correct, that's
>    good news, because it will make optimization a lot easier.

Cool.  Thanks for running this.  =)  BTW, was there a performance
penalty by using SMS?  (I'm knee deep in other stuff right now, so I
don't have time to analyze the performance impact myself.)

I think as we get more experience we'll get even better.  I personally
like that fact that we have decoupled the memory allocation code, so
this now enables us to play with other memory management techniques.
Once it settles down (and I have a solid benchmark program written),
I'll probably play with some different strategies.  Trivial is very
similar to the current pool allocator, so I wonder if there is benefit
by using other allocation strategies.

And, if we arrange the SMS in proper order, threaded MPMs should benefit 
tremendously because we reduce the need for mutex locking (which I think
is far and away where we would see improvement with this code).  Sander 
has a thread SMS (very similar to trivial), but I haven't had a chance 
to look at it yet in detail.  (I don't think he has committed it yet.) 
-- justin

P.S. If you could email me a copy of the gprof output or post it on a
website somewhere, I'd like to take a look at it.


Re: Initial profiling data on the SMS pools implementation

Posted by Cliff Woolley <cl...@yahoo.com>.
On Sat, 7 Jul 2001, Brian Pane wrote:

> I used gprof to do some performance profiling of an httpd built
> using SMS-based pools.
>
> Below is the list of the top 100 functions, ranked by CPU usage.
> (The full output of gprof, showing what functions contribute the
> most invocations of what other functions, is ~300KB, so I haven't
> included it here.  I'm happy to send a copy to anybody who's
> interested in the details.)

Cool... thanks.  I don't suppose there's any way to attach gprof to httpd
AFTER it's started up, is there?  I'd really love to see those numbers.
I'm trying to figure out why there were so many calls to apr_file_read()
and no apparent calls to sendfile() or mmap().  I realize that libc
functions were not included in the list, but I'd think we'd have seen the
related apr/apr-util/httpd functions in the list and many fewer calls to
apr_file_read().

I'm really interested in taking a look at the full data.  Can you just
post it somewhere on the web and list the URL here?

Thanks,
Cliff


--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA