You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Bill Stoddard <bi...@wstoddard.com> on 2002/07/02 22:29:59 UTC

Apache 2 cycle vs instruction profiling

Profiled serving a 500 byte file out of mod_mem_cache (CacheEnable mem /).
This analysis covers one complete request starting at ap_read_request and
ending with ap_graceful_stop_signalled.

I've hacked Apache 2 in several places...
The most notable hack is that I remove ap_http_header_filter and send the
pre-built headers straight out of the cache. This substantially reduced the
time spent in apr_brigade_puts (and was good for a 4% or so performance
boost). Can get a lot fancier here by making sure the buffer holding the
headers is properly aligned on a page boundary.

I also shortcircuit the age and freshness calculations in mod_cache (to save
gettimeofday calls). I know we have performance problems with apr_time_now
with potential fixes being discussed.

Last hack is to do a call apr_setsocketopt(csd, APR_INCOMPLETE_READ, 1)
immediately after the call to ap_graceful_stop_signaled to cause the next
ap_read_request to go directly to select rather than calling read first. I
think this is a good optimization because we just checked for a pipelined
request (read returned EAGAIN) so no sense in calling read again. Will
submit a patch for this piece later.

Instructions = Cycles/Ratio

I assume a large ratio is probably a bad thing and indicates, among other
things, that pipeline flushes, pagefaults, etc, are occuring...  I have a
routine flow which should show transtions to the kernel (not sure if it will
show pagefaults or not) that I have not spent a lot of time on. We spend a
suprising amount of time in module log routines and logging is not enabled.

Bill

Space                      %    Cycles
=====                     ====  =====
User                      19.0  46645
Shared Library            48.6  119222
Kernel                    32.4  79405

Total                           245272 (4.22 per instruction)

/usr/local/apachetest/bin/httpd :

Subroutine Name          Source File                  Ratio Enter  %
Cycles
===============          ===========                  ===== ===== ====
======
.core_input_filter       core.c                        3.15 11     1.5 3628
.ap_rgetline_core        protocol.c                    3.05 6      1.3 3241
.get_filter_handle       util_filter.c                 4.58 2      1.1 2705
.ap_find_token           util.c                        4.16 6      0.9 2306
.add_any_filter_handle   util_filter.c                 4.37 6      0.9 2090
.net_time_filter         core.c                        2.90 12     0.7 1713
._ptrgl                  ptrgl.s                       4.13 66     0.7 1637
.fix_hostname            vhost.c                       3.65 1      0.7 1626
.core_output_filter      core.c                        3.63 2      0.6 1469
.ap_update_child_status_from_indexes scoreboard.c     17.94 2      0.6 1435
.add_any_filter          util_filter.c                 5.46 1      0.6 1403
.ap_get_brigade          util_filter.c                 3.32 25     0.4 1079
.ap_set_keepalive        http_protocol.c               3.50 2      0.4 1036
._moveeq                 moveeq.s                      4.72 14     0.4 1025
.ap_read_request         protocol.c                    6.32 1      0.3 834
.ap_getword_white        util.c                        3.09 2      0.3 784
.ap_get_mime_headers     protocol.c                    3.32 1      0.3 756
.ap_content_length_filter protocol.c                   4.33 1      0.3 736
.ap_pass_brigade         util_filter.c                 4.82 5      0.2 584
.ap_recent_rfc822_date   util_time.c                   2.74 1      0.2 520
.log_error_core          log.c                         4.28 4      0.2 513
.read_request_line       protocol.c                    4.15 1      0.2 502
.apr_palloc              glink.s                       3.71 22     0.2 490
.apr_brigade_destroy     glink.s                       6.51 12     0.2 469
.isspace                 glink.s                       3.45 22     0.2 455
.core_create_req         core.c                        7.57 1      0.2 439
.apr_table_get           glink.s                       4.50 16     0.2 432
.ap_run_log_transaction  protocol.c                   12.05 1      0.2 398
.ap_run_quick_handler    config.c                     13.96 1      0.2 391
.cached_explode          util_time.c                   5.61 1      0.1 365
.ap_run_create_request   request.c                     8.06 1      0.1 363
.ap_log_error            log.c                         3.82 4      0.1 336
.tolower                 glink.s                       2.93 19     0.1 334
.ap_http_filter          http_protocol.c               3.35 2      0.1 332
.ap_add_output_filters_by_type core.c                 14.32 1      0.1 329
.apr_brigade_create      glink.s                       3.48 15     0.1 313
.__divi64                glink.s                      12.72 4      0.1 305
.isalpha                 glink.s                       2.64 19     0.1 301
.lookup_builtin_method   http_protocol.c              10.91 1      0.1 295
.apr_setsocketopt        glink.s                       4.39 11     0.1 290
.ap_process_http_connection http_core.c                8.17 0      0.1 286
.writev_it_all           core.c                        6.07 1      0.1 273
.basic_http_header       http_protocol.c               3.09 1      0.1 260
.http_create_request     http_core.c                   6.19 1      0.1 235
.strlen                  glink.s                       9.76 4      0.1 234
.apr_brigade_split_line  glink.s                       6.39 6      0.1 230
.apr_table_make          glink.s                       6.28 6      0.1 226
.form_header_field       http_protocol.c               2.73 2      0.1 213
.avoid_xlc_bug           core.c                        4.79 11     0.1 211
.ap_byterange_filter     http_protocol.c               7.26 1      0.1 210
.ap_run_post_read_request protocol.c                   5.88 1      0.1 200
.ap_discard_request_body http_protocol.c               2.82 1      0.1 200
.apr_pstrcatv            glink.s                      29.67 1      0.1 178
.strchr                  glink.s                       7.39 4      0.1 177
.basic_http_header_check http_protocol.c               4.18 1      0.1 176
.check_pipeline_flush    http_request.c                3.97 1      0.1 175
.ap_update_vhost_from_headers vhost.c                  5.61 1      0.1 174
.memset                  glink.s                       6.95 4      0.1 167
.strncasecmp             glink.s                       5.44 5      0.1 163
.create_empty_config     config.c                      7.75 1      0.1 163
.apr_brigade_puts        glink.s                       6.48 4      0.1 156
.ap_process_request      http_request.c                5.68 1      0.1 153
.ap_graceful_stop_signalled worker.c                  51.13 1      0.1 153
.apr_brigade_write       glink.s                       3.18 8      0.1 153
.ap_parse_uri            protocol.c                    3.40 1      0.1 150
.ap_set_byterange        http_protocol.c               4.54 1      0.1 145
.ap_add_output_filter    util_filter.c                17.40 1      0.1 139
.apr_pool_destroy        glink.s                      22.80 1      0.1 137
.ap_basic_http_header    http_protocol.c               9.04 1      0.1 136
.apr_table_unset         glink.s                      10.73 2      0.1 129
.apr_sendv               glink.s                      20.00 1      0.0 120
.ap_update_child_status  scoreboard.c                  8.30 2      0.0 116
.ap_index_of_response    http_protocol.c               6.09 1      0.0 116
.ap_get_limit_req_body   core.c                       15.43 1      0.0 108
.apr_array_make          glink.s                      17.97 1      0.0 108
.ap_get_server_version   core.c                       21.36 1      0.0 107
.remove_any_filter       util_filter.c                 3.44 2      0.0 107
.apr_table_mergen        glink.s                       8.72 2      0.0 105
.isdigit                 glink.s                       8.20 2      0.0 98
.apr_table_overlap       glink.s                      16.00 1      0.0 96
.apr_pstrdup             glink.s                      15.73 1      0.0 94
.apr_table_addn          glink.s                       3.93 4      0.0 94
.ap_get_output_filter_handle util_filter.c            15.53 2      0.0 93
.apr_off_t_toa           glink.s                      15.53 1      0.0 93
.apr_uri_parse           glink.s                      15.33 1      0.0 92
.apr_brigade_partition   glink.s                       3.73 4      0.0 90
.apr_pool_create_ex      glink.s                      14.60 1      0.0 88
.apr_brigade_split       glink.s                       2.85 5      0.0 86
.ap_make_method_list     http_protocol.c               3.25 1      0.0 81
.apr_parse_addr_port     glink.s                      13.10 1      0.0 79
.ap_set_content_length   protocol.c                    3.73 1      0.0 75
.ap_remove_output_filter util_filter.c                 3.42 2      0.0 68
.ap_explode_recent_gmt   util_time.c                  17.05 1      0.0 68
.apr_psprintf            glink.s                       5.57 2      0.0 67
.ap_set_content_type     http_protocol.c               2.91 1      0.0 61
.apr_bucket_flush_create glink.s                       9.67 1      0.0 58
.apr_brigade_cleanup     glink.s                       9.17 1      0.0 55
.ap_create_request_config config.c                    54.20 1      0.0 54
.ap_add_output_filter_handle util_filter.c             2.83 3      0.0 51
.ap_method_number_of     http_protocol.c               2.40 1      0.0 48
.ap_finalize_request_protocol protocol.c               2.47 1      0.0 47
.apr_table_setn          glink.s                       2.39 3      0.0 43
.apr_time_now            glink.s                       6.23 1      0.0 37
.ap_add_input_filter_handle util_filter.c              2.83 2      0.0 34
.apr_pstrmemdup          glink.s                       5.67 1      0.0 34
.apr_bucket_eos_create   glink.s                       2.17 1      0.0 13

Shlib Subroutine         Source File                  Ratio Enter  %
Cycles
================         ===========                  ===== ===== ====
======
.memset                  memset.s                      4.99 9      5.5 13580
.apr_table_get           apr_tables.c                  3.15 21     2.1 5030
.__is_wctype_std         libc/__is_wctype_std.c        3.34 68     1.8 4536
.strcasecmp              libaixinet/strcasecmp.c       3.47 12     1.8 4458
.apr_palloc              apr_pools.c                   1.97 104    1.5 3680
.apr_vformatter          apr_snprintf.c                3.83 2      1.3 3231
.apr_table_setn          apr_tables.c                  2.97 13     1.2 2936
._moveeq                 moveeq.s                      3.83 11     1.1 2818
.apr_setsocketopt        sockopt.c                     3.53 13     1.1 2753
.__divu64                divu64.s                      2.75 3      1.1 2675
.memchr                  libc/memchr.c                 3.61 11     1.1 2668
.__divi64                divi64.s                      2.71 5      1.0 2509
.strlen                  strlen.s                      3.18 17     0.8 2077
.pthread_mutex_lock      libpthreads/mutex.c          10.40 3      0.8 1841
.apr_bucket_alloc        apr_buckets_alloc.c           2.72 28     0.7 1684
.apr_brigade_cleanup     apr_brigade.c                 2.74 17     0.7 1648
.apr_brigade_puts        apr_brigade.c                 2.32 5      0.6 1585
.isspace                 libc/isspace.c                2.84 23     0.5 1306
.apr_brigade_split_line  apr_brigade.c                 2.43 6      0.5 1297
.apr_brigade_write       apr_brigade.c                 2.22 11     0.5 1247
.apr_brigade_create      apr_brigade.c                 2.11 21     0.5 1199
.strncasecmp             libaixinet/strcasecmp.c       2.74 6      0.5 1177
.cache_url_handler       mod_cache.c                   5.88 1      0.5 1147
.allocator_alloc         apr_pools.c                   4.47 5      0.5 1143
._moveeq                 moveeq.s                      4.70 16     0.5 1118
.apr_pool_cleanup_register apr_pools.c                 2.17 22     0.4 1099
.apr_bucket_free         apr_buckets_alloc.c           2.87 28     0.4 1080
._ptrgl                  ptrgl.s                       2.46 72     0.4 1063
._ptrgl                  ptrgl.s                       2.65 63     0.4 1000
.allocator_free          apr_pools.c                   4.23 5      0.4 960
.tolower                 libc/tolower.c                2.87 19     0.4 926
.apr_bucket_simple_copy  apr_buckets_simple.c          3.39 14     0.4 902
.multi_log_transaction   mod_log_config.c             28.55 1      0.4 885
.make_array_core         apr_tables.c                  2.40 12     0.3 790
.apr_pool_cleanup_kill   apr_pools.c                   4.05 12     0.3 786
.unserialize_table       mod_mem_cache.c               5.17 4      0.3 781
.strchr                  strchr.s                      2.90 5      0.3 778
.apr_array_push_noclear  apr_tables.c                  1.69 20     0.3 776
.isupper                 libc/isupper.c                1.94 19     0.3 739
.isalpha                 libc/isalpha.c                1.92 19     0.3 730
.apr_recv                sendrecv.c                    6.67 2      0.3 693
.heap_bucket_destroy     apr_buckets_heap.c            2.49 17     0.3 644
.apr_uri_parse           apr_uri.c                     5.22 1      0.3 627
.match_headers           mod_setenvif.c               16.49 1      0.3 627
.apr_table_mergen        apr_tables.c                  3.11 2      0.3 621
.find_entry              cache_hash.c                  4.42 1      0.3 619
.apr_table_overlap       apr_tables.c                  2.98 1      0.2 611
.pthread_mutex_unlock    libpthreads/mutex.c           5.02 3      0.2 603
.ap_cache_get_cachetype  cache_util.c                 10.48 1      0.2 597
.overlap_hash            apr_tables.c                  3.50 4      0.2 588
.apr_palloc              glink.s                       2.62 37     0.2 583
.apr_bucket_simple_split apr_buckets_simple.c          1.82 10     0.2 564
.apr_table_unset         apr_tables.c                  2.72 2      0.2 564
.read                    libc/read.c                   7.89 2      0.2 552
.spin_lock_global_ppc_up locks_ppc_up.s               16.71 3      0.2 551
.apr_brigade_destroy     apr_brigade.c                 2.68 12     0.2 547
.apr_bucket_alloc        glink.s                       3.21 28     0.2 539
.heap_bucket_read        apr_buckets_heap.c            1.95 30     0.2 528
.open_entity             mod_mem_cache.c               6.32 1      0.2 512
.apr_brigade_partition   apr_brigade.c                 3.31 4      0.2 476
.apr_table_make          apr_tables.c                  2.50 10     0.2 475
.islower                 libc/islower.c                5.89 4      0.2 471
.run_cleanups            apr_pools.c                   9.58 1      0.2 460
.apr_pool_cleanup_kill   glink.s                       6.06 12     0.2 437
.socket_bucket_read      apr_buckets_socket.c          3.62 2      0.2 428
.pthread_mutex_lock      glink.s                      23.62 3      0.2 425
.cache_out_filter        mod_cache.c                   4.35 1      0.2 387
.apr_palloc              glink.s                       3.07 21     0.2 387
.apr_sendv               sendrecv.c                    8.37 1      0.2 385
.apr_pvsprintf           apr_pools.c                   3.91 2      0.2 383
.apr_pstrcat             apr_strings.c                 3.15 1      0.2 378
.conv_10                 apr_snprintf.c                2.64 4      0.2 370
.apr_bucket_heap_make    apr_buckets_heap.c            3.81 3      0.1 365
.apr_pool_cleanup_register glink.s                     2.88 21     0.1 362
.apr_bucket_shared_split apr_buckets_refcount.c        2.00 10     0.1 360
.apr_wait_for_io_or_timeout sendrecv.c                 6.40 1      0.1 352
.apr_bucket_shared_destroy glink.s                     3.45 17     0.1 352
.cache_select_url        cache_storage.c               4.76 1      0.1 343
.strcasecmp              glink.s                       5.13 11     0.1 339
.apr_table_addn          apr_tables.c                  2.14 4      0.1 333
.apr_pstrdup             apr_strings.c                 2.84 5      0.1 330
.apr_parse_addr_port     sockaddr.c                    5.30 1      0.1 323
.apr_bucket_shared_destroy apr_buckets_refcount.c      3.07 17     0.1 313
.kread                   glink.s                      26.03 2      0.1 312
.pthread_mutex_unlock    glink.s                      16.82 3      0.1 303
.isdigit                 libc/isdigit.c                5.03 3      0.1 302
.apr_brigade_split       apr_brigade.c                 2.12 5      0.1 292
.apr_allocator_alloc     glink.s                      12.09 4      0.1 290
.spin_unlock_global_ppc_up locks_ppc_up.s              6.94 3      0.1 271
.apr_thread_mutex_lock   thread_mutex.c                5.60 3      0.1 269
.decrement_refcount      mod_mem_cache.c              10.18 1      0.1 265
.apr_pool_destroy        apr_pools.c                   4.97 1      0.1 263
.writev                  libc/write.c                 13.13 1      0.1 263
._ptrgl                  ptrgl.s                      10.81 4      0.1 259
.strlen                  glink.s                       5.28 8      0.1 254
.read_headers            mod_mem_cache.c               3.96 1      0.1 253
.apr_bucket_simple_split glink.s                       4.07 10     0.1 244
.apr_pstrcatv            apr_strings.c                 2.88 1      0.1 242
.read_real_time          read_real_time.s              8.61 1      0.1 241
.__pthread_geterrno_addr libpthreads/lib_lock.c        4.91 4      0.1 236
.apr_pool_create_ex      apr_pools.c                   3.50 1      0.1 235
._Errno                  libc/errno.c                  3.99 3      0.1 228
.apr_bucket_shared_make  glink.s                      12.47 3      0.1 224
.read_body               mod_mem_cache.c               5.32 1      0.1 223
.apr_atomic_dec          apr_atomic.c                  7.10 1      0.1 220
.apr_pstrmemdup          glink.s                      36.37 1      0.1 218
._ptrgl                  ptrgl.s                       8.62 4      0.1 207
.memchr                  glink.s                       3.18 10     0.1 191
._ptrgl                  ptrgl.s                       6.33 5      0.1 190
.memcache_gdsf_algorithm mod_mem_cache.c               6.68 2      0.1 187
.apr_off_t_toa           apr_strings.c                 2.84 1      0.1 182
.apr_atomic_inc          apr_atomic.c                  6.14 1      0.1 178
.apr_pstrndup            apr_strings.c                 5.72 1      0.1 177
.cache_run_open_entity   cache_storage.c               5.50 1      0.1 176
.ap_cache_liststr        cache_util.c                  4.14 2      0.1 174
.read                    glink.s                      14.18 2      0.1 170
.apr_allocator_free      glink.s                       7.08 4      0.1 170
.gettimeofday            libc/gettimeofday.c           3.96 1      0.1 166
.ap_cache_tokstr         cache_util.c                  3.66 1      0.1 165
.memcmp                  memcmp.s                      4.41 1      0.1 159
.apr_thread_mutex_unlock thread_mutex.c                3.24 3      0.1 156
.apr_pstrmemdup          apr_strings.c                 3.33 2      0.1 153
.apr_table_setn          glink.s                       2.50 10     0.1 150
.memset                  glink.s                      12.15 2      0.1 146
.apr_recv                glink.s                      11.87 2      0.1 142
.islower                 glink.s                       5.60 4      0.1 134
.apr_thread_mutex_lock   glink.s                      11.12 2      0.1 133
.apr_time_now            time.c                        5.29 1      0.1 132
.ap_set_content_type     glink.s                      21.80 1      0.1 131
.__divi64                glink.s                      21.70 1      0.1 130
.apr_vformatter          glink.s                      10.83 2      0.1 130
.strlen                  glink.s                       4.33 5      0.1 130
.cache_hash_get          glink.s                      21.33 1      0.1 128
.memcmp                  glink.s                      21.30 1      0.1 128
._Errno                  glink.s                       7.04 3      0.1 127
.ap_cache_get_cachetype  glink.s                      20.77 1      0.1 125
.ap_cache_tokstr         glink.s                      20.60 1      0.1 124
.apr_table_make          glink.s                       5.09 4      0.0 122
.apr_bucket_heap_create  glink.s                      10.13 2      0.0 122
.ap_cache_liststr        glink.s                      10.08 2      0.0 121
.cache_find              glink.s                      18.63 1      0.0 112
.cache_update            glink.s                      18.63 1      0.0 112
.cache_pq_change_priority cache_pqueue.c               5.30 1      0.0 111
.apr_bucket_shared_copy  apr_buckets_refcount.c        1.69 4      0.0 108
.apr_atomic_dec          glink.s                      17.50 1      0.0 105
.kwritev                 glink.s                      17.13 1      0.0 103
.gettimeofday            glink.s                      17.10 1      0.0 103
.apr_bucket_eos_create   apr_buckets_eos.c             2.55 2      0.0 102
.apr_bucket_socket_create apr_buckets_socket.c         4.82 1      0.0 101
.cache_find              cache_cache.c                 7.71 1      0.0 100
.bzero                   glink.s                      16.67 1      0.0 100
.ap_log_error            glink.s                       4.16 4      0.0 100
.free_proc_chain         apr_pools.c                   6.97 1      0.0 98
.apr_table_get           glink.s                       3.25 5      0.0 98
.cache_read_entity_body  cache_storage.c               9.74 1      0.0 97
.writev                  glink.s                      16.00 1      0.0 96
.apr_bucket_heap_create  apr_buckets_heap.c            1.79 2      0.0 93
.time_base_to_time       libc/POWER/time_base_to_time.c  1.66 1      0.0 93
.apr_pstrdup             glink.s                       5.16 3      0.0 93
.memset                  glink.s                      15.47 1      0.0 93
.ap_pass_brigade         glink.s                       7.70 2      0.0 92
.select                  glink.s                      15.30 1      0.0 92
.apr_bucket_shared_make  apr_buckets_refcount.c        5.08 3      0.0 91
.apr_thread_mutex_lock   glink.s                      15.13 1      0.0 91
.apr_bucket_immortal_create apr_buckets_simple.c          3.54 1      0.0 85
.isdigit                 glink.s                      13.77 1      0.0 83
.ap_add_output_filter    glink.s                      13.53 1      0.0 81
.minchild                cache_pqueue.c                4.48 1      0.0 76
.apr_bucket_simple_copy  glink.s                       3.08 4      0.0 74
.cache_generate_key_default cache_storage.c               4.29 1      0.0 73
.cache_read_entity_body  glink.s                      11.87 1      0.0 71
.apr_bucket_free         glink.s                       2.93 4      0.0 70
.apr_bucket_eos_make     apr_buckets_eos.c             4.80 2      0.0 67
.apr_array_make          apr_tables.c                  3.36 1      0.0 67
.cache_pq_percolate_down cache_pqueue.c                3.94 1      0.0 67
.cache_update            cache_cache.c                 2.16 1      0.0 67
.apr_psprintf            apr_pools.c                   1.83 2      0.0 66
.apr_brigade_create      glink.s                      10.97 1      0.0 66
.apr_thread_mutex_unlock glink.s                       5.48 2      0.0 66
.ap_set_keepalive        glink.s                       5.45 2      0.0 65
.apr_bucket_eos_create   glink.s                      10.77 1      0.0 65
.apr_allocator_mutex_get apr_pools.c                  15.90 2      0.0 64
.strncasecmp             glink.s                      10.47 1      0.0 63
.memchr                  glink.s                      10.17 1      0.0 61
.brigade_cleanup         apr_brigade.c                15.00 4      0.0 60
.apr_bucket_flush_create apr_buckets_flush.c           2.86 1      0.0 57
.cache_read_entity_headers cache_storage.c               2.60 1      0.0 57
.apr_bucket_immortal_create glink.s                       9.23 1      0.0 55
.apr_getsocketopt        glink.s                       9.10 1      0.0 55
.apr_atomic_inc          glink.s                       9.00 1      0.0 54
.simple_bucket_read      apr_buckets_simple.c          3.26 2      0.0 52
.apr_pstrcat             glink.s                       8.33 1      0.0 50
.cache_hash_get          cache_hash.c                  3.54 1      0.0 50
.ap_remove_output_filter glink.s                       8.10 1      0.0 49
.memcache_get_pos        mod_mem_cache.c              16.00 1      0.0 48
.apr_palloc              glink.s                       3.93 2      0.0 47
.ap_get_output_filter_handle glink.s                       3.90 2      0.0
47
.apr_brigade_write       glink.s                       2.39 3      0.0 43
.strcasecmp              glink.s                       7.07 1      0.0 42
.apr_bucket_immortal_make apr_buckets_simple.c          5.86 1      0.0 41
.bzero                   memset.s                     13.27 1      0.0 40
.apr_bucket_heap_make    glink.s                       6.50 1      0.0 39
.ap_basic_http_header    glink.s                       6.17 1      0.0 37
.apr_bucket_socket_make  apr_buckets_socket.c          5.23 1      0.0 37
.apr_pstrndup            glink.s                       5.63 1      0.0 34
.strchr                  glink.s                       5.23 1      0.0 31
.apr_bucket_destroy_noop apr_buckets.c                 7.00 4      0.0 28
.apr_allocator_free      apr_pools.c                   7.00 4      0.0 28
.apr_allocator_alloc     apr_pools.c                   7.00 4      0.0 28
.apr_getsocketopt        sockopt.c                     4.47 1      0.0 27
.apr_setsocketopt        glink.s                       2.17 2      0.0 26
.isspace                 glink.s                       4.07 1      0.0 24
.apr_allocator_owner_get apr_pools.c                   5.00 2      0.0 20
.memcache_inc_frequency  mod_mem_cache.c               2.14 1      0.0 15
.apr_bucket_flush_make   apr_buckets_flush.c           2.00 1      0.0 14
.apr_brigade_puts        glink.s                       2.17 1      0.0 13
.memset                  glink.s                       2.17 1      0.0 13
.cache_select_url        glink.s                       2.17 1      0.0 13
.apr_pstrdup             glink.s                       2.17 1      0.0 13
.cache_pq_change_priority glink.s                       2.17 1      0.0 13
.apr_pool_cleanup_register glink.s                       2.17 1      0.0 13
.apr_thread_mutex_unlock glink.s                       2.17 1      0.0 13
.eos_bucket_read         apr_buckets_eos.c             2.75 1      0.0 11


Re: Apache 2 cycle vs instruction profiling

Posted by Justin Erenkrantz <je...@apache.org>.
On Tue, Jul 02, 2002 at 03:22:51PM -0700, Ryan Morgan wrote:
> 
> I sent a patch to the list a while back for that one:
> 
> http://marc.theaimsgroup.com/?l=apache-httpd-dev&m=101445378225873&w=2

All I can say is that when you don't hear about a patch, please
resubmit until you get feedback.  -- justin

Re: Apache 2 cycle vs instruction profiling

Posted by Ryan Morgan <rm...@covalent.net>.
I sent a patch to the list a while back for that one:

http://marc.theaimsgroup.com/?l=apache-httpd-dev&m=101445378225873&w=2

On Tue, Jul 02, 2002 at 02:33:38PM -0700, Justin Erenkrantz wrote:
> On Tue, Jul 02, 2002 at 02:13:23PM -0700, Brian Pane wrote:
> > We may be able to replace some more ap_add_*_filter() calls with
> > ap_add_*_filter_handle() to reduce the cost of the add_any_filter()
> > trie lookup.
> 
> CORE_IN and CORE are added via ap_add_*_filter() (server/core.c:4063).
> 
> I'll bet that's why this function is so high.  -- justin
> 

Re: Apache 2 cycle vs instruction profiling

Posted by Justin Erenkrantz <je...@apache.org>.
On Tue, Jul 02, 2002 at 02:13:23PM -0700, Brian Pane wrote:
> We may be able to replace some more ap_add_*_filter() calls with
> ap_add_*_filter_handle() to reduce the cost of the add_any_filter()
> trie lookup.

CORE_IN and CORE are added via ap_add_*_filter() (server/core.c:4063).

I'll bet that's why this function is so high.  -- justin

RE: Apache 2 cycle vs instruction profiling

Posted by Bill Stoddard <bi...@wstoddard.com>.
> Bill Stoddard wrote:
>
> >Subroutine Name          Source File                  Ratio Enter  %
> >Cycles
> >===============          ===========                  ===== ===== ====
> >======
> >.core_input_filter       core.c                        3.15 11
>   1.5 3628
> >.ap_rgetline_core        protocol.c                    3.05 6
>   1.3 3241
> >
>
> Are these numbers from the latest version of ap_rgetline_core(), with
> the optimization to re-use the same temp brigade for the lookahead read?
> If so, I'll proceed with the next wave of fixes on that function (using
> a persistent brigade) to try to optimize it out of the top 5.

These results are from 1.107 and the optimization went into 1.108. I'll
rerun the profile with HEAD and post the results.

>
> >.get_filter_handle       util_filter.c                 4.58 2
>   1.1 2705
> >.ap_find_token           util.c                        4.16 6
>   0.9 2306
> >
>
> Wow, I've never seen ap_find_token() rank so high in a profile before.
> Does your profiling tool show any information on where it's
> getting called?

ap_set_keepalive calls ap_find_token three times. Ooops, I see I am calling
ap_set_keepalive twice (once to many) in mod_cache. I'll correct this.

>
> >.fix_hostname            vhost.c                       3.65 1
>   0.7 1626
> >
>
> fix_hostname() is spending all its time in string processing and
> string-to-int
> conversions.  There's probably an optimization opportunity there.
>
> >.ap_update_child_status_from_indexes scoreboard.c     17.94 2
>   0.6 1435
> >
>
> Is the high ratio here just a result of the cost of writing to the
> scoreboard
> in shared memory?

I suspect so.

Bill




Re: Apache 2 cycle vs instruction profiling

Posted by Brian Pane <br...@apache.org>.
Bill Stoddard wrote:

>Subroutine Name          Source File                  Ratio Enter  %
>Cycles
>===============          ===========                  ===== ===== ====
>======
>.core_input_filter       core.c                        3.15 11     1.5 3628
>.ap_rgetline_core        protocol.c                    3.05 6      1.3 3241
>

Are these numbers from the latest version of ap_rgetline_core(), with
the optimization to re-use the same temp brigade for the lookahead read?
If so, I'll proceed with the next wave of fixes on that function (using
a persistent brigade) to try to optimize it out of the top 5.

>.get_filter_handle       util_filter.c                 4.58 2      1.1 2705
>.ap_find_token           util.c                        4.16 6      0.9 2306
>

Wow, I've never seen ap_find_token() rank so high in a profile before.
Does your profiling tool show any information on where it's getting called?

>.fix_hostname            vhost.c                       3.65 1      0.7 1626
>

fix_hostname() is spending all its time in string processing and 
string-to-int
conversions.  There's probably an optimization opportunity there.

>.ap_update_child_status_from_indexes scoreboard.c     17.94 2      0.6 1435
>

Is the high ratio here just a result of the cost of writing to the 
scoreboard
in shared memory?

>.add_any_filter          util_filter.c                 5.46 1      0.6 1403
>

We may be able to replace some more ap_add_*_filter() calls with
ap_add_*_filter_handle() to reduce the cost of the add_any_filter()
trie lookup.

--Brian