You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by "William A. Rowe, Jr." <wr...@rowe-clan.net> on 2002/06/28 16:33:27 UTC

Breaking something? Now is the time?

I have one bit that must be broken before 1.0, and cannot be remedied easily.
I'd like to simply break these things before Apache 2.0.40 is tagged.

apr_pstrcatv should have never been declared _NONSTD, it was and there
isn't much we can do about it without breaking binary compat or introducing
a replacement symbol name.  Since we like our apr_pstrcat and apr_foov
conventions, I don't want to rename.

This is very similar to Cliff's requested change to apr_table_[v]do syntax,
returning an int instead of a void.  We like the existing names there, too.
There are two subtle differences though.

On Win32 (the only platform that cares about _NONSTD), the symbol name
will actually _change_ when it goes from _NONSTD (where it is 'apr_pstrcatv')
over to APR_DECLARE() (where it will be _apr_pstrcatv@16 ... designating
the stack args size per MS exports convention.)

With the apr_table_[v]do change, we won't change the declaration.  Only the
return arg will change.  I agree the odds are that all platforms return as 
register
and that the stack isn't affected.  Odds are that the register will be silently
discarded without harm or foul.  The possibility that the return value is 
on the
stack in some c implementations remains, however.

If we insist on breaking things, occasionally, pre-1.0, this looks like the 
time.
If Cliff wants to commit the semantic change to apr_table_[v]do, I'll +1 
here and
raise you a _NONSTD correction.  Along with Sander's changes to make the
unsafe transparent apr_allocator.h structure opaque, I'd say we have a bit
of worthwhile breakage to inflict before we go on.

By the way, 99.5% of coders will be unaffected by any of these three changes.
They can take advantage of the apr_table_[v]do change or ignore it.  Most folks
haven't implemented custom apr_allocators just yet - those that have likely
followed its progress.  And the Win32 change simply requires a recompile, which
is unfortunate, but can't be avoided due to the ap_allocator.h stuff anyways.

Comments?

Bill

--- include/apr_strings.h	12 May 2002 00:56:26 -0000	1.25
+++ include/apr_strings.h	28 Jun 2002 14:21:07 -0000
@@ -179,8 +179,8 @@
   * @param nbytes (output) strlen of new string (pass in NULL to omit)
   * @return The new string
   */
-APR_DECLARE_NONSTD(char *) apr_pstrcatv(apr_pool_t *p, const struct iovec 
*vec,
-                                        apr_size_t nvec, apr_size_t *nbytes);
+APR_DECLARE(char *) apr_pstrcatv(apr_pool_t *p, const struct iovec *vec,
+                                 apr_size_t nvec, apr_size_t *nbytes);

  /**
   * printf-style style printing routine.  The data is output to a string
--- strings/apr_strings.c	13 May 2002 16:09:22 -0000	1.27
+++ strings/apr_strings.c	28 Jun 2002 14:21:07 -0000
@@ -177,8 +177,8 @@
      return res;
  }

-APR_DECLARE_NONSTD(char *) apr_pstrcatv(apr_pool_t *a, const struct iovec 
*vec,
-                                        apr_size_t nvec, apr_size_t *nbytes)
+APR_DECLARE(char *) apr_pstrcatv(apr_pool_t *a, const struct iovec *vec,
+                                 apr_size_t nvec, apr_size_t *nbytes)
  {
      apr_size_t i;
      apr_size_t len;


Re: Breaking something? Now is the time?

Posted by Cliff Woolley <jw...@virginia.edu>.
On Fri, 28 Jun 2002, William A. Rowe, Jr. wrote:

> If it is used by -anybody- they trust the existing implementation.
> That said, it should behave sensibly.  The fact that you've asked three
> times says you want to change it.

Hehehehe You noticed?  :)  Sorry to be a pest, I'm just getting sick of
changing things only to have someone come behind a week later and say "you
can't do that."  I just wanted to be sure.

Thanks,
Cliff


Re: Breaking something? Now is the time?

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
If it is used by -anybody- they trust the existing implementation.

That said, it should behave sensibly.  The fact that you've asked three
times says you want to change it.

Make it so ;-)

Bill

At 01:38 PM 6/28/2002, Cliff Woolley wrote:
>On Fri, 28 Jun 2002, William A. Rowe, Jr. wrote:
>
> > IMHO, the implementation is what people have tested, not the documented
> > behavior.  Use the source, luke :-)
>
>But what I'm saying is that I don't think anybody *has* tested it.  I
>couldn't find a single use case in Apache where the called function would
>ever return anything other than 1, meaning that this "early-termination"
>functionality is not used by Apache AFAICT.
>
>--Cliff



Re: Breaking something? Now is the time?

Posted by Cliff Woolley <jw...@virginia.edu>.
On Fri, 28 Jun 2002, William A. Rowe, Jr. wrote:

> IMHO, the implementation is what people have tested, not the documented
> behavior.  Use the source, luke :-)

But what I'm saying is that I don't think anybody *has* tested it.  I
couldn't find a single use case in Apache where the called function would
ever return anything other than 1, meaning that this "early-termination"
functionality is not used by Apache AFAICT.

--Cliff


Re: Breaking something? Now is the time?

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
At 01:22 PM 6/28/2002, you wrote:
>On Fri, 28 Jun 2002, William A. Rowe, Jr. wrote:
>
> > If Cliff wants to commit the semantic change to apr_table_[v]do, I'll
> > +1 here and raise you a _NONSTD correction.  Along with Sander's
> > changes to make the unsafe transparent apr_allocator.h structure
> > opaque, I'd say we have a bit of worthwhile breakage to inflict before
> > we go on. By the way, 99.5% of coders will be unaffected by any of
> > these three changes. They can take advantage of the apr_table_[v]do
> > change or ignore it.
>
>So you didn't indicate an opinion on whether the existing semantics of
>apr_table_vdo() match their documentation, and if not, whether it's the
>docs or the implementation that have it right.  I need to know in order to
>proceed with the return-type change.

IMHO, the implementation is what people have tested, not the documented
behavior.  Use the source, luke :-)



Re: APR_STATUS_* semantics [Re: Breaking something? Now is the time?]

Posted by Cliff Woolley <jw...@virginia.edu>.
On Fri, 28 Jun 2002, William A. Rowe, Jr. wrote:

> >What I'd like to propose is that we document that, for any given status
> >code, _more_ than one APR_STATUS_IS* macro can match, and it's the
> >programmer's responsibility to decide in what order to make the tests.

+1

--Cliff


Re: APR_STATUS_* semantics [Re: Breaking something? Now is the time?]

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
At 02:43 PM 6/28/2002, =?UTF-8?B?QnJhbmtvIMSMaWJlag==?= wrote:
>Since we're talking about semantics, breakage, etc, I'll take the 
>opportunity to bore everybody with an issue I'd like resolved, too; 
>Namely, the semantics of the APR_STATUS_IS_* macros.
>
>I've said several times before that APR_STATUS_IS_ENOENT and 
>APR_STATUS_IS_ENOTDIR don't have the same meaning on Windows and Unix. 
>That's because Windows doesn't have an error code that would mean exactly 
>the same as the Posix ENOTDIR. Simulating it would be a huge cost, though.
>
>Here's an example of the differing behaviour: If "foo" does not exist, 
>doint an apr_stat("foo/bar") will trigger APR_STATUS_IS_ENOENT on Unix, 
>but APR_STATUS_IS_ENOTDIR on Windows. That makes it very hard to write a 
>porable "mkdir -p" implementation; and, indeed, apr_dir_make_recursive 
>can't work correctly on Windows because of that.
>
>What I'd like to propose is that we document that, for any given status 
>code, _more_ than one APR_STATUS_IS* macro can match, and it's the 
>programmer's responsibility to decide in what order to make the tests.

Brane, thank you for a terrific explanation of the issue.  +1 here.

Bill


APR_STATUS_* semantics [Re: Breaking something? Now is the time?]

Posted by Branko Čibej <br...@xbc.nu>.
Since we're talking about semantics, breakage, etc, I'll take the 
opportunity to bore everybody with an issue I'd like resolved, too; 
Namely, the semantics of the APR_STATUS_IS_* macros.

I've said several times before that APR_STATUS_IS_ENOENT and 
APR_STATUS_IS_ENOTDIR don't have the same meaning on Windows and Unix. 
That's because Windows doesn't have an error code that would mean 
exactly the same as the Posix ENOTDIR. Simulating it would be a huge 
cost, though.

Here's an example of the differing behaviour: If "foo" does not exist, 
doint an apr_stat("foo/bar") will trigger APR_STATUS_IS_ENOENT on Unix, 
but APR_STATUS_IS_ENOTDIR on Windows. That makes it very hard to write a 
porable "mkdir -p" implementation; and, indeed, apr_dir_make_recursive 
can't work correctly on Windows because of that.

What I'd like to propose is that we document that, for any given status 
code, _more_ than one APR_STATUS_IS* macro can match, and it's the 
programmer's responsibility to decide in what order to make the tests.

My proposed patch for the ENOENT issue would then be:

Index: apr_errno.h
===================================================================
RCS file: /home/cvs/apr/include/apr_errno.h,v
retrieving revision 1.91
diff -u -r1.91 apr_errno.h
--- apr_errno.h 20 May 2002 13:22:36 -0000      1.91
+++ apr_errno.h 28 Jun 2002 19:40:58 -0000
@@ -923,6 +923,7 @@
                 || (s) == APR_OS_START_SYSERR + WSAENAMETOOLONG)
 #define APR_STATUS_IS_ENOENT(s)         ((s) == APR_ENOENT \
                 || (s) == APR_OS_START_SYSERR + ERROR_FILE_NOT_FOUND \
+                || (s) == APR_OS_START_SYSERR + ERROR_PATH_NOT_FOUND \
                 || (s) == APR_OS_START_SYSERR + ERROR_OPEN_FAILED \
                 || (s) == APR_OS_START_SYSERR + ERROR_NO_MORE_FILES)
 #define APR_STATUS_IS_ENOTDIR(s)        ((s) == APR_ENOTDIR \



Wrowe and I discussed this quite a bit, but haven't come to a final 
decision yet. I'm bringing it up again because this is definitely 
something that has to be fixed before we hit 1.0.

    Thanks,

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


Re: Breaking something? Now is the time?

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
At 02:11 PM 6/28/2002, Brian Pane wrote:
>I want to break something: binary compatibility for the pool API.
>
>This has been on my list for a long time, but I haven't yet had
>time to implement it.

What you are describing is [was] SMS.

Even with the opaque structure, we are still facing derefs that will
significantly alter performance.

Definately look in CVS's attic for the original implementation to
start with.

We don't have the time to implement this in the current Apache
release cycle [2.0.40] given the issues people raised with the
original SMS implementation.  I suggest that -if- we prove this
is viable, we introduce it with the rollout of version 1.0.

Bill



Re: Breaking something? Now is the time?

Posted by Emery Berger <em...@cs.utexas.edu>.
Justin Erenkrantz wrote:

> IIRC, I did macroize it during my test runs (at one point at least -
> I may not have committed it) and found no performance improvement.
> The problems seemed to be with the function pointer itself.  It's all
> a little fuzzy though, so it's possible I didn't macroize.  
> 
> A better solution, IMHO, would be just to code a drop-in replacement
> for memory/unix/apr_pools.c.  -- justin

Perhaps it would be useful to know exactly what benchmarks people are 
using to identify the impact of these changes.

-- Emery



Re: Breaking something? Now is the time?

Posted by Cliff Woolley <jw...@virginia.edu>.
On Fri, 28 Jun 2002, Justin Erenkrantz wrote:

> p->alloc
> and
> #define apr_palloc(...) p->alloc

None, but that's not what we were doing with SMS.

> to the fact that we used to have a function like this:
>
> apr_palloc()
> {
>  return p->alloc();
> }


Yeah, but it did even more.  It was more like:

apr_sms_alloc(sms, size)
{
   /* ... do a bunch of stuff here ... */
   rv = sms->type->alloc(sms, size);
   /* ... do some more stuff ... */
   return rv;
}


Re: Breaking something? Now is the time?

Posted by Justin Erenkrantz <je...@apache.org>.
On Fri, Jun 28, 2002 at 12:22:01PM -0700, Brian Pane wrote:
> I think SMS's use of a wrapper function to do the indirect method
> call was the main problem, which is why we'd have to use a macro
> instead if we reintroduced a function pointer model.

Count me confused, but what is the difference between:

p->alloc

and

#define apr_palloc(...) p->alloc

Aren't they going to resolve to the same thing?  Or are you referring
to the fact that we used to have a function like this:

apr_palloc()
{
 return p->alloc();
}

IIRC, I did macroize it during my test runs (at one point at least -
I may not have committed it) and found no performance improvement.
The problems seemed to be with the function pointer itself.  It's all
a little fuzzy though, so it's possible I didn't macroize.  

A better solution, IMHO, would be just to code a drop-in replacement
for memory/unix/apr_pools.c.  -- justin

Re: Breaking something? Now is the time?

Posted by Brian Pane <br...@cnet.com>.
Justin Erenkrantz wrote:

>On Fri, Jun 28, 2002 at 12:11:09PM -0700, Brian Pane wrote:
>  
>
>>I want to break something: binary compatibility for the pool API.
>>
>>This has been on my list for a long time, but I haven't yet had
>>time to implement it.
>>
>>What I'm thinking of is the following:
>>
>>* Preface the apr_pool_t structure with a set of function
>>  pointers for the pool's "methods": alloc, free, destroy,
>>  create subpool, etc.
>>    
>>
>
>Sounds like SMS.  We could never overcome speed limitations and we
>always seemed to place blame on the function pointers as the reason
>why the SMS performance wasn't as good as pools.  
>

I think SMS's use of a wrapper function to do the indirect method
call was the main problem, which is why we'd have to use a macro
instead if we reintroduced a function pointer model.

>I'd want to see performance metrics saying that we aren't going to
>see a massive performance decrease with this.  -- justin
>  
>

Definitely.

--Brian




Re: Breaking something? Now is the time?

Posted by Cliff Woolley <jw...@virginia.edu>.
On Fri, 28 Jun 2002, Justin Erenkrantz wrote:

> Sounds like SMS.  We could never overcome speed limitations and we
> always seemed to place blame on the function pointers as the reason
> why the SMS performance wasn't as good as pools.

We had function pointers *and* wrapper functions.  We never tried it with
the macros approach (even though I wanted to, I didn't have time to code
it before SMS was summarily nuked).

> I'd want to see performance metrics saying that we aren't going to
> see a massive performance decrease with this.  -- justin

Knowing Brian, I'm sure he wouldn't suggest anything without considering
the performance impact first and foremost.  :-]  But yeah, I'd like to see
the numbers too.

--Cliff


Re: 2.0 performance Re: Breaking something? Now is the time?

Posted by Ian Holsman <ia...@apache.org>.
we might be able to remove the 'vary' processing that mod_cache does
by changing the keygeneration function to add the contents of the vary 
request header as part of the key.
(and then switching this version on with a directive)

Bill Stoddard wrote:
>>Bill Stoddard wrote:
>>
>>
>>>Yes, please, we need some performance measurements.  I've been doing some
>>>profiling of Apache 2.0 on AIX and even with mod_mem_cache, we
>>
>>still serve
>>
>>>static files with keep-alive at about half the rate of iPlanet. The sad
>>>thing is I don't see any single smoking guns. Just lots of little stuff
>>>everywhere.
>>>
>>>
>>
>>Indeed, all the big problems have been fixed, and what remains is a long
>>list of small things to optimize.  My list includes:
>>
>>  * the buffering of keepalive responses < 8KB (which turns sendfile
>>    operations into mmap+memcpy)
>>  * lots of string operations in directory_walk/location_walk/file_walk
>>  * the code that creates and destroys a temporary brigade for each line
>>    in order to read the request header
>>  * our memory usage is a bit higher than it probably should be
>>  * regex comparisons in file_walk and mod_setenvif
>>  * mod_mime's find_ct() does too much string manipulation
>>  * apr_table_get (even with all the optimizations that we've
>>already done)
>>
>>Do you have additional things that you've found in your profiling?
>>
>>Thanks,
>>--Brian
> 
> 
> mod_mem_cache bypasses most all of those things. Using mod_mem_cache to
> cache a buffer in heap (contents of a 500 byte file).  I have even hack the
> code slightly to turn off the pipelined request optimization. I am playing
> with a new tool so I don't fully grok the output just yet but here is what I
> see serving a single keep alive request out of mod_mem_cache.
> 
> All modules are DSOs (and the link overhead shows this). The divisions are
> real killers (at least on this machine). Suggests that we need a call,
> apr_time_now_sec() to fetch the time with seconds resolution (configurable).
> Now we do multiplications to convert to uS then do divisions to convert back
> to seconds in a number of routines. The time spent in ap_brigade_puts is
> suprising...  This particular run indicate that it tool 74355 instructions
> to serve a keep alive request. This profile is taken on unmodified Apache
> 2.0 code. Well almost... the avoid_xlc_bug is a hack to get around a bug in
> the xlc optimizer.  I'll gladly post the hack to the list if you want to see
> it. The density of RING macros in the core_input_filter f*cks up the xlc
> optimizer...
> 
> Space                      %    Ticks
> =====                     ====  =====
> User                      15.2  11300
> Shared Library            56.5  42028
> Kernel                    28.3  21027
> 
> Total                           74355 (Footprint = 84.523 KB)
> 
> Pid   Process Name         %    Ticks
> ===== ============        ====  =====
> 23454 httpd              100.0  74355
> 
> Tid   Process Name         %    Ticks
> ===== ============        ====  =====
> 49159 httpd              100.0  74355
> 
> 
> ./httpd :
> 
> Subroutine Name          Source File                  Visit Enter  %   Ticks
> ===============          ===========                  ===== ===== ==== =====
> .core_input_filter       core.c                       51    11     1.6 1165
> .ap_rgetline_core        protocol.c                   66    6      1.4 1061
> .net_time_filter         core.c                       36    12     0.8 615
> .core_output_filter      core.c                       17    3      0.7 520
> .add_any_filter_handle   util_filter.c                12    6      0.6 478
> ._ptrgl                  ptrgl.s                      74    74     0.6 444
> .form_header_field       http_protocol.c              50    10     0.5 390
> .ap_get_brigade          util_filter.c                50    25     0.4 325
> .fix_hostname            vhost.c                      2     1      0.3 259
> .add_any_filter          util_filter.c                1     1      0.3 257
> .ap_getword_white        util.c                       28    2      0.3 254
> .ap_find_token           util.c                       5     3      0.3 229
> .ap_get_mime_headers     protocol.c                   16    1      0.3 228
> ._moveeq                 moveeq.s                     14    14     0.3 217
> .pcre_exec               pcre.c                       4     1      0.3 209
> .ap_recent_rfc822_date   util_time.c                  15    1      0.3 190
> .match                   pcre.c                       1     1      0.2 181
> .ap_pass_brigade         util_filter.c                14    7      0.2 169
> .ap_set_keepalive        http_protocol.c              12    1      0.2 152
> .apr_brigade_write       glink.s                      25    25     0.2 150
> .ap_content_length_filter protocol.c                   6     1      0.2 145
> .ap_read_request         protocol.c                   18    1      0.2 132
> .ap_http_header_filter   http_protocol.c              15    1      0.2 132
> .apr_palloc              glink.s                      22    22     0.2 132
> .isspace                 glink.s                      22    22     0.2 132
> .read_request_line       protocol.c                   11    1      0.2 121
> .log_error_core          log.c                        4     4      0.2 120
> .apr_brigade_puts        glink.s                      20    20     0.2 120
> .apr_table_get           glink.s                      19    19     0.2 114
> .apr_brigade_create      glink.s                      17    17     0.1 102
> .ap_http_filter          http_protocol.c              8     2      0.1 99
> .ap_log_error            log.c                        8     4      0.1 88
> .basic_http_header       http_protocol.c              13    1      0.1 84
> .ap_update_child_status_from_indexes scoreboard.c                 2     2
> 0.1 80
> .ap_make_content_type    protocol.c                   7     1      0.1 78
> .apr_brigade_destroy     glink.s                      12    12     0.1 72
> .ap_save_brigade         util_filter.c                3     1      0.1 70
> .ap_discard_request_body http_protocol.c              4     1      0.1 68
> .ap_meets_conditions     http_protocol.c              8     1      0.1 67
> .apr_setsocketopt        glink.s                      11    11     0.1 66
> .cached_explode          util_time.c                  5     1      0.1 65
> .regexec                 pcreposix.c                  3     1      0.1 60
> .core_create_req         core.c                       5     1      0.1 58
> .ap_get_remote_host      core.c                       1     1      0.1 56
> .remove_any_filter       util_filter.c                3     3      0.1 48
> .strlen                  glink.s                      8     8      0.1 48
> .check_pipeline_flush    http_request.c               5     1      0.1 45
> .writev_it_all           core.c                       2     1      0.1 45
> .ap_run_create_request   request.c                    3     1      0.1 45
> .avoid_xlc_bug           core.c                       11    11     0.1 44
> .ap_parse_uri            protocol.c                   3     1      0.1 44
> .basic_http_header_check http_protocol.c              3     1      0.1 41
> .ap_run_insert_filter    request.c                    3     1      0.1 38
> .http_create_request     http_core.c                  4     1      0.1 38
> .ap_process_http_connection http_core.c                  5     0      0.1 38
> .terminate_header        http_protocol.c              4     1      0.1 38
> .apr_table_make          glink.s                      6     6      0.0 36
> .apr_brigade_split_line  glink.s                      6     6      0.0 36
> .ap_run_post_read_request protocol.c                   2     1      0.0 34
> .ap_run_log_transaction  protocol.c                   2     1      0.0 33
> .core_insert_filter      core.c                       1     1      0.0 32
> .ap_set_byterange        http_protocol.c              3     1      0.0 32
> .ap_update_vhost_from_headers vhost.c                      3     1      0.0
> 31
> .fixup_vary              http_protocol.c              3     1      0.0 30
> .apr_brigade_length      glink.s                      5     5      0.0 30
> .ap_remove_output_filter util_filter.c                3     3      0.0 30
> .ap_byterange_filter     http_protocol.c              4     1      0.0 29
> .ap_run_quick_handler    config.c                     2     1      0.0 28
> .lookup_builtin_method   http_protocol.c              1     1      0.0 27
> .ap_process_request      http_request.c               5     1      0.0 27
> .ap_make_method_list     http_protocol.c              3     1      0.0 25
> .apr_brigade_split       glink.s                      4     4      0.0 24
> .apr_table_addn          glink.s                      4     4      0.0 24
> .memset                  glink.s                      4     4      0.0 24
> .strchr                  glink.s                      4     4      0.0 24
> .__divi64                glink.s                      4     4      0.0 24
> .apr_brigade_partition   glink.s                      4     4      0.0 24
> .ap_add_output_filters_by_type core.c                       1     1      0.0
> 23
> .create_empty_config     config.c                     3     1      0.0 21
> .ap_set_content_type     http_protocol.c              2     1      0.0 21
> .ap_method_number_of     http_protocol.c              3     1      0.0 20
> .ap_get_remote_logname   core.c                       1     1      0.0 20
> .ap_set_content_length   protocol.c                   3     1      0.0 20
> .ap_index_of_response    http_protocol.c              1     1      0.0 19
> .ap_finalize_request_protocol protocol.c                   2     1      0.0
> 19
> .ap_add_output_filter_handle util_filter.c                3     3      0.0
> 18
> .apr_table_setn          glink.s                      3     3      0.0 18
> .ap_update_child_status  scoreboard.c                 2     2      0.0 14
> .strncasecmp             glink.s                      2     2      0.0 12
> .apr_table_unset         glink.s                      2     2      0.0 12
> .apr_pstrcatv            glink.s                      2     2      0.0 12
> .apr_array_make          glink.s                      2     2      0.0 12
> .isdigit                 glink.s                      2     2      0.0 12
> .ap_add_input_filter_handle util_filter.c                2     2      0.0 12
> .apr_time_now            glink.s                      2     2      0.0 12
> .apr_table_do            glink.s                      2     2      0.0 12
> .ap_regexec              util.c                       2     1      0.0 9
> .ap_add_output_filter    util_filter.c                1     1      0.0 8
> .ap_get_limit_req_body   core.c                       1     1      0.0 7
> .apr_psprintf            glink.s                      1     1      0.0 6
> .apr_uri_parse           glink.s                      1     1      0.0 6
> .apr_table_mergen        glink.s                      1     1      0.0 6
> .apr_pstrmemdup          glink.s                      1     1      0.0 6
> .apr_bucket_eos_create   glink.s                      1     1      0.0 6
> .apr_table_overlap       glink.s                      1     1      0.0 6
> .apr_brigade_cleanup     glink.s                      1     1      0.0 6
> .apr_bucket_flush_create glink.s                      1     1      0.0 6
> .apr_parse_addr_port     glink.s                      1     1      0.0 6
> .apr_pool_create_ex      glink.s                      1     1      0.0 6
> .apr_pstrdup             glink.s                      1     1      0.0 6
> .apr_pool_destroy        glink.s                      1     1      0.0 6
> .apr_sendv               glink.s                      1     1      0.0 6
> .apr_off_t_toa           glink.s                      1     1      0.0 6
> .ap_get_server_version   core.c                       1     1      0.0 5
> .ap_explode_recent_gmt   util_time.c                  1     1      0.0 4
> .ap_graceful_stop_signalled worker.c                     1     1      0.0 3
> .ap_create_request_config config.c                     1     1      0.0 1
> 
> Shlib Subroutine         Source File                  Visit Enter  %   Ticks
> ================         ===========                  ===== ===== ==== =====
> .apr_brigade_puts        apr_brigade.c                20    20     4.4 3259
> .__divu64                divu64.s                     9     9      4.0 3006
> .apr_palloc              apr_pools.c                  120   120    2.9 2160
> .apr_table_get           apr_tables.c                 33    28     2.9 2145
> .__divi64                divi64.s                     5     5      1.9 1413
> .apr_brigade_write       apr_brigade.c                54    25     1.7 1262
> .strlen                  strlen.s                     35    35     1.7 1260
> .apr_table_setn          apr_tables.c                 27    13     1.4 1025
> .__is_wctype_std         libc/__is_wctype_std.c       49    49     1.3 980
> .strcasecmp              libaixinet/strcasecmp.c      10    10     1.1 829
> .memset                  memset.s                     11    11     1.1 805
> .apr_setsocketopt        sockopt.c                    13    13     1.0 779
> ._moveeq                 moveeq.s                     26    25     1.0 768
> .memchr                  libc/memchr.c                11    11     1.0 718
> .apr_bucket_alloc        apr_buckets_alloc.c          34    30     0.9 664
> .apr_vformatter          apr_snprintf.c               8     2      0.9 663
> .apr_brigade_cleanup     apr_brigade.c                59    19     0.8 632
> .apr_brigade_create      apr_brigade.c                66    22     0.8 594
> .apr_pool_cleanup_register apr_pools.c                  50    25     0.8 575
> .apr_brigade_split_line  apr_brigade.c                32    6      0.7 534
> .config_log_transaction  mod_log_config.c             47    1      0.7 525
> .apr_array_push_noclear  apr_tables.c                 24    22     0.7 502
> .isspace                 libc/isspace.c               46    23     0.6 460
> .apr_bucket_free         apr_buckets_alloc.c          34    30     0.5 402
> .isupper                 libc/isupper.c               40    20     0.5 400
> ._ptrgl                  ptrgl.s                      64    64     0.5 384
> .make_array_core         apr_tables.c                 28    13     0.5 362
> .process_item            mod_log_config.c             28    14     0.5 350
> .tolower                 libc/tolower.c               40    20     0.5 340
> .apr_table_unset         apr_tables.c                 3     2      0.5 339
> .apr_table_vdo           apr_tables.c                 10    2      0.4 322
> ._ptrgl                  ptrgl.s                      53    53     0.4 318
> .apr_bucket_simple_split apr_buckets_simple.c         20    10     0.4 310
> .heap_bucket_read        apr_buckets_heap.c           31    31     0.4 279
> .apr_palloc              glink.s                      45    45     0.4 270
> .apr_bucket_simple_copy  apr_buckets_simple.c         28    14     0.4 266
> ._moveeq                 moveeq.s                     18    18     0.4 264
> .allocator_alloc         apr_pools.c                  5     5      0.4 263
> .strchr                  strchr.s                     5     5      0.4 263
> .heap_bucket_destroy     apr_buckets_heap.c           40    17     0.3 259
> .match_headers           mod_setenvif.c               10    1      0.3 247
> .match_boyer_moore_horspool apr_strmatch.c               6     6      0.3
> 233
> .allocator_free          apr_pools.c                  5     5      0.3 229
> .apr_pool_cleanup_kill   apr_pools.c                  14    14     0.3 217
> .strncasecmp             libaixinet/strcasecmp.c      3     3      0.3 216
> .match_boyer_moore_horspool_nocase apr_strmatch.c               23    3
> 0.3 212
> .apr_table_overlap       apr_tables.c                 13    1      0.3 205
> .apr_brigade_destroy     apr_brigade.c                36    12     0.3 204
> .conv_10                 apr_snprintf.c               17    3      0.3 193
> .apr_table_make          apr_tables.c                 30    10     0.3 190
> .apr_brigade_length      apr_brigade.c                5     5      0.3 190
> .apr_bucket_shared_split apr_buckets_refcount.c       20    10     0.2 180
> .apr_bucket_alloc        glink.s                      30    30     0.2 180
> .pthread_mutex_lock      libpthreads/mutex.c          6     3      0.2 177
> .apr_pstrdup             apr_strings.c                25    7      0.2 168
> .time_base_to_time       libc/POWER/time_base_to_time.c 12    3      0.2 168
> .overlap_hash            apr_tables.c                 4     4      0.2 168
> .cache_url_handler       mod_cache.c                  19    1      0.2 165
> ._moveeq                 moveeq.s                     15    15     0.2 157
> .apr_table_addn          apr_tables.c                 8     4      0.2 156
> .apr_pstrcatv            apr_strings.c                11    2      0.2 154
> .unserialize_table       mod_mem_cache.c              14    4      0.2 151
> .apr_brigade_partition   apr_brigade.c                8     4      0.2 144
> .find_entry              cache_hash.c                 3     1      0.2 140
> .apr_pool_cleanup_register glink.s                      23    23     0.2 138
> .ap_cache_check_freshness cache_util.c                 8     1      0.2 134
> .apr_palloc              glink.s                      22    22     0.2 132
> .apr_off_t_toa           apr_strings.c                10    2      0.2 128
> .gettimeofday            libc/gettimeofday.c          12    3      0.2 126
> .apr_brigade_split       apr_brigade.c                8     4      0.2 120
> .pthread_mutex_unlock    libpthreads/mutex.c          6     3      0.2 120
> .apr_uri_parse           apr_uri.c                    3     1      0.2 120
> .tolower                 glink.s                      20    20     0.2 120
> .socket_bucket_read      apr_buckets_socket.c         12    2      0.2 118
> .apr_bucket_shared_destroy glink.s                      19    19     0.2 114
> .apr_bucket_shared_destroy apr_buckets_refcount.c       19    19     0.2 114
> .apr_table_mergen        apr_tables.c                 2     1      0.1 111
> .apr_table_set           apr_tables.c                 4     1      0.1 109
> .apr_pvsprintf           apr_pools.c                  4     2      0.1 98
> .apr_bucket_heap_make    apr_buckets_heap.c           9     3      0.1 96
> ._ptrgl                  ptrgl.s                      16    16     0.1 96
> .apr_recv                sendrecv.c                   7     2      0.1 95
> .strlen                  glink.s                      14    14     0.1 84
> .open_entity             mod_mem_cache.c              9     1      0.1 84
> .read_real_time          read_real_time.s             3     3      0.1 84
> ._ptrgl                  ptrgl.s                      14    14     0.1 84
> .apr_pool_cleanup_kill   glink.s                      13    13     0.1 78
> .apr_time_now            time.c                       9     3      0.1 75
> .cache_select_url        cache_storage.c              8     1      0.1 73
> .read                    libc/read.c                  5     2      0.1 70
> .apr_pool_create_ex      apr_pools.c                  3     1      0.1 67
> .apr_bucket_shared_copy  apr_buckets_refcount.c       8     4      0.1 64
> .apr_itoa                apr_strings.c                5     1      0.1 64
> .run_cleanups            apr_pools.c                  8     1      0.1 62
> .apr_mmap_create         mmap.c                       5     1      0.1 62
> .apr_parse_addr_port     sockaddr.c                   5     1      0.1 61
> .memchr                  glink.s                      10    10     0.1 60
> .apr_table_setn          glink.s                      10    10     0.1 60
> .islower                 libc/islower.c               6     3      0.1 60
> .apr_bucket_simple_split glink.s                      10    10     0.1 60
> .isdigit                 libc/isdigit.c               6     3      0.1 60
> .read_headers            mod_mem_cache.c              11    1      0.1 60
> .ap_cache_get_cachetype  cache_util.c                 3     1      0.1 57
> ._Errno                  libc/errno.c                 6     3      0.1 57
> .apr_pool_destroy        apr_pools.c                  7     1      0.1 53
> .apr_bucket_heap_create  apr_buckets_heap.c           6     2      0.1 52
> .cache_out_filter        mod_cache.c                  6     1      0.1 50
> .strcasecmp              glink.s                      8     8      0.1 48
> .apr_table_get           glink.s                      8     8      0.1 48
> .log_request_time        mod_log_config.c             4     1      0.1 48
> .apr_thread_mutex_unlock thread_mutex.c               6     3      0.1 48
> .__pthread_geterrno_addr libpthreads/lib_lock.c       4     4      0.1 48
> .apr_thread_mutex_lock   thread_mutex.c               6     3      0.1 48
> .read_body               mod_mem_cache.c              4     1      0.1 47
> .apr_pstrmemdup          apr_strings.c                6     2      0.1 46
> .apr_sendv               sendrecv.c                   2     1      0.1 46
> .ap_cache_tokstr         cache_util.c                 5     1      0.1 45
> .strlen                  glink.s                      7     7      0.1 42
> .ap_cache_liststr        cache_util.c                 4     2      0.1 42
> .apr_array_make          apr_tables.c                 6     2      0.1 40
> .apr_bucket_eos_create   apr_buckets_eos.c            6     2      0.1 40
> .file_make_mmap          apr_buckets_file.c           4     1      0.1 39
> .spin_unlock_global_ppc_up locks_ppc_up.s               3     3      0.1 39
> .ap_cache_current_age    cache_util.c                 2     1      0.1 38
> .multi_log_transaction   mod_log_config.c             2     1      0.0 37
> .apr_table_do            apr_tables.c                 4     2      0.0 36
> .memcmp                  memcmp.s                     1     1      0.0 36
> .apr_psprintf            apr_pools.c                  4     2      0.0 36
> ._ptrgl                  ptrgl.s                      6     6      0.0 36
> .apr_bucket_mmap_make    apr_buckets_mmap.c           4     1      0.0 36
> .apr_bucket_free         glink.s                      6     6      0.0 36
> .apr_os_file_put         open.c                       3     1      0.0 35
> .spin_lock_global_ppc_up locks_ppc_up.s               3     3      0.0 33
> .file_bucket_read        apr_buckets_file.c           3     1      0.0 33
> .cache_run_open_entity   cache_storage.c              2     1      0.0 32
> .apr_file_write          readwrite.c                  2     1      0.0 32
> .apr_pstrndup            apr_strings.c                4     1      0.0 31
> .cache_update            cache_cache.c                5     1      0.0 31
> .mmap_bucket_destroy     apr_buckets_mmap.c           5     1      0.0 31
> .apr_atomic_dec          apr_atomic.c                 4     1      0.0 31
> .memset                  glink.s                      5     5      0.0 30
> .apr_bucket_file_make    apr_buckets_file.c           3     1      0.0 30
> .strlen                  glink.s                      5     5      0.0 30
> .apr_bucket_shared_make  apr_buckets_refcount.c       5     5      0.0 30
> .apr_bucket_shared_make  glink.s                      5     5      0.0 30
> .apr_atomic_inc          apr_atomic.c                 4     1      0.0 29
> .apr_bucket_file_create  apr_buckets_file.c           3     1      0.0 28
> .memcache_gdsf_algorithm mod_mem_cache.c              3     2      0.0 28
> .write                   libc/write.c                 2     1      0.0 27
> .mmap_bucket_read        apr_buckets_mmap.c           2     1      0.0 27
> .decrement_refcount      mod_mem_cache.c              2     1      0.0 26
> .ap_headers_insert_output_filter mod_headers.c                1     1
> 0.0 25
> .mmap_cleanup            mmap.c                       2     1      0.0 25
> .apr_palloc              glink.s                      4     4      0.0 24
> .ap_log_error            glink.s                      4     4      0.0 24
> ._ptrgl                  ptrgl.s                      4     4      0.0 24
> .apr_allocator_alloc     glink.s                      4     4      0.0 24
> .apr_bucket_simple_copy  glink.s                      4     4      0.0 24
> .apr_table_make          glink.s                      4     4      0.0 24
> ._ptrgl                  ptrgl.s                      4     4      0.0 24
> .apr_allocator_free      glink.s                      4     4      0.0 24
> .cache_read_entity_headers cache_storage.c              3     1      0.0 22
> .cache_pq_change_priority cache_pqueue.c               3     1      0.0 21
> .apr_bucket_socket_create apr_buckets_socket.c         3     1      0.0 21
> .writev                  libc/write.c                 2     1      0.0 20
> .apr_bucket_flush_create apr_buckets_flush.c          3     1      0.0 20
> .pthread_mutex_lock      glink.s                      3     3      0.0 18
> .islower                 glink.s                      3     3      0.0 18
> .gettimeofday            glink.s                      3     3      0.0 18
> ._Errno                  glink.s                      3     3      0.0 18
> .pthread_mutex_unlock    glink.s                      3     3      0.0 18
> .apr_pstrdup             glink.s                      3     3      0.0 18
> .cache_generate_key_default cache_storage.c              2     1      0.0 17
> .cache_pq_percolate_down cache_pqueue.c               2     1      0.0 17
> .minchild                cache_pqueue.c               1     1      0.0 17
> .log_request_line        mod_log_config.c             1     1      0.0 16
> .file_bucket_destroy     apr_buckets_file.c           3     1      0.0 15
> .log_remote_user         mod_log_config.c             1     1      0.0 15
> .clf_log_bytes_sent      mod_log_config.c             2     1      0.0 15
> .apr_bucket_eos_make     apr_buckets_eos.c            2     2      0.0 14
> .cache_hash_get          cache_hash.c                 2     1      0.0 14
> .log_remote_host         mod_log_config.c             2     1      0.0 14
> .free_proc_chain         apr_pools.c                  1     1      0.0 14
> .constant_item           mod_log_config.c             7     7      0.0 14
> .apr_mmap_delete         mmap.c                       2     1      0.0 13
> .cache_find              cache_cache.c                2     1      0.0 13
> .apr_thread_mutex_lock   glink.s                      2     2      0.0 12
> .apr_thread_mutex_unlock glink.s                      2     2      0.0 12
> .apr_pstrdup             glink.s                      2     2      0.0 12
> .apr_vformatter          glink.s                      2     2      0.0 12
> .kread                   glink.s                      2     2      0.0 12
> .read                    glink.s                      2     2      0.0 12
> .ap_cache_liststr        glink.s                      2     2      0.0 12
> .apr_setsocketopt        glink.s                      2     2      0.0 12
> .apr_palloc              glink.s                      2     2      0.0 12
> .apr_pool_cleanup_run    apr_pools.c                  3     1      0.0 12
> .apr_bucket_heap_create  glink.s                      2     2      0.0 12
> .strcasecmp              glink.s                      2     2      0.0 12
> .ap_pass_brigade         glink.s                      2     2      0.0 12
> .apr_recv                glink.s                      2     2      0.0 12
> .apr_mmap_offset         common.c                     1     1      0.0 10
> .cache_read_entity_body  cache_storage.c              2     1      0.0 10
> .format_integer          mod_log_config.c             2     1      0.0 9
> .log_remote_logname      mod_log_config.c             2     1      0.0 9
> .memcache_inc_frequency  mod_mem_cache.c              1     1      0.0 7
> .apr_bucket_flush_make   apr_buckets_flush.c          1     1      0.0 7
> .apr_bucket_socket_make  apr_buckets_socket.c         1     1      0.0 7
> .cache_find              glink.s                      1     1      0.0 6
> .ap_set_content_type     glink.s                      1     1      0.0 6
> .apr_pstrdup             glink.s                      1     1      0.0 6
> .apr_bucket_eos_create   glink.s                      1     1      0.0 6
> .apr_bucket_file_create  glink.s                      1     1      0.0 6
> .apr_os_file_put         glink.s                      1     1      0.0 6
> .apr_atomic_dec          glink.s                      1     1      0.0 6
> .apr_thread_mutex_unlock glink.s                      1     1      0.0 6
> .apr_atomic_inc          glink.s                      1     1      0.0 6
> .apr_thread_mutex_lock   glink.s                      1     1      0.0 6
> .apr_pool_cleanup_register glink.s                      1     1      0.0 6
> .memcmp                  glink.s                      1     1      0.0 6
> .ap_cache_tokstr         glink.s                      1     1      0.0 6
> .ap_cache_check_freshness glink.s                      1     1      0.0 6
> .cache_select_url        glink.s                      1     1      0.0 6
> .ap_cache_get_cachetype  glink.s                      1     1      0.0 6
> .cache_read_entity_body  glink.s                      1     1      0.0 6
> .apr_brigade_create      glink.s                      1     1      0.0 6
> .ap_run_insert_filter    glink.s                      1     1      0.0 6
> .ap_meets_conditions     glink.s                      1     1      0.0 6
> .ap_add_output_filter    glink.s                      1     1      0.0 6
> .apr_time_now            glink.s                      1     1      0.0 6
> .apr_table_set           glink.s                      1     1      0.0 6
> .apr_psprintf            glink.s                      1     1      0.0 6
> .strncasecmp             glink.s                      1     1      0.0 6
> .apr_pstrndup            glink.s                      1     1      0.0 6
> .isspace                 glink.s                      1     1      0.0 6
> .strchr                  glink.s                      1     1      0.0 6
> .ap_remove_output_filter glink.s                      1     1      0.0 6
> .memset                  glink.s                      1     1      0.0 6
> .mmap                    glink.s                      1     1      0.0 6
> .munmap                  glink.s                      1     1      0.0 6
> .apr_getsocketopt        sockopt.c                    1     1      0.0 6
> .apr_pool_cleanup_register glink.s                      1     1      0.0 6
> .apr_pool_cleanup_run    glink.s                      1     1      0.0 6
> .kwrite                  glink.s                      1     1      0.0 6
> .kwritev                 glink.s                      1     1      0.0 6
> .apr_pstrmemdup          glink.s                      1     1      0.0 6
> .memset                  glink.s                      1     1      0.0 6
> .brigade_cleanup         apr_brigade.c                6     6      0.0 6
> .isdigit                 glink.s                      1     1      0.0 6
> .apr_getsocketopt        glink.s                      1     1      0.0 6
> .apr_bucket_mmap_make    glink.s                      1     1      0.0 6
> .apr_bucket_heap_make    glink.s                      1     1      0.0 6
> .apr_mmap_offset         glink.s                      1     1      0.0 6
> .apr_mmap_delete         glink.s                      1     1      0.0 6
> .apr_file_write          glink.s                      1     1      0.0 6
> .__divi64                glink.s                      1     1      0.0 6
> .apr_off_t_toa           glink.s                      1     1      0.0 6
> .memchr                  glink.s                      1     1      0.0 6
> .apr_mmap_create         glink.s                      1     1      0.0 6
> .ap_get_remote_logname   glink.s                      1     1      0.0 6
> .cache_hash_get          glink.s                      1     1      0.0 6
> .ap_get_remote_host      glink.s                      1     1      0.0 6
> .apr_itoa                glink.s                      1     1      0.0 6
> .cache_pq_change_priority glink.s                      1     1      0.0 6
> .write                   glink.s                      1     1      0.0 6
> .writev                  glink.s                      1     1      0.0 6
> .cache_update            glink.s                      1     1      0.0 6
> .ap_regexec              glink.s                      1     1      0.0 6
> .apr_table_get           glink.s                      1     1      0.0 6
> .strlen                  glink.s                      1     1      0.0 6
> .apr_allocator_alloc     apr_pools.c                  4     4      0.0 4
> .apr_allocator_free      apr_pools.c                  4     4      0.0 4
> .log_status              mod_log_config.c             1     1      0.0 4
> .apr_allocator_owner_get apr_pools.c                  2     2      0.0 4
> .eos_bucket_read         apr_buckets_eos.c            1     1      0.0 4
> .apr_allocator_mutex_get apr_pools.c                  2     2      0.0 4
> .apr_bucket_destroy_noop apr_buckets.c                3     3      0.0 3
> .memcache_get_pos        mod_mem_cache.c              1     1      0.0 3
> .pfmt                    mod_log_config.c             1     1      0.0 3
> .apr_bucket_setaside_noop apr_buckets.c                1     1      0.0 2
> 




Re: 2.0 performance Re: Breaking something? Now is the time?

Posted by Brian Pane <br...@cnet.com>.
Brian Pane wrote:

> Bill Stoddard wrote:
>
>> mod_mem_cache bypasses most all of those things. Using mod_mem_cache to
>> cache a buffer in heap (contents of a 500 byte file).  I have even 
>> hack the
>> code slightly to turn off the pipelined request optimization. I am 
>> playing
>> with a new tool so I don't fully grok the output just yet but here is 
>> what I
>> see serving a single keep alive request out of mod_mem_cache.
>>
>
> By the way, in your test case with 500-byte files, were the connections
> keep-alive?  If so, do you see better results with files >8KB (which 
> don't
> suffer from the known performance problem in core_output_filter)? 


Never mind, I suddenly realized that that's what you meant
by hacking the pipeline optimization code.

--Brian




Re: 2.0 performance Re: Breaking something? Now is the time?

Posted by Brian Pane <bp...@pacbell.net>.
Bill Stoddard wrote:

>mod_mem_cache bypasses most all of those things. Using mod_mem_cache to
>cache a buffer in heap (contents of a 500 byte file).  I have even hack the
>code slightly to turn off the pipelined request optimization. I am playing
>with a new tool so I don't fully grok the output just yet but here is what I
>see serving a single keep alive request out of mod_mem_cache.
>

By the way, in your test case with 500-byte files, were the connections
keep-alive?  If so, do you see better results with files >8KB (which don't
suffer from the known performance problem in core_output_filter)?

--Brian



RE: [PATCH] Re: 2.0 performance Re: Breaking something? Now is the time?

Posted by Bill Stoddard <bi...@wstoddard.com>.
> >> The time spent in ap_brigade_puts is
> >> suprising...  This particular run indicate that it tool 74355
> >> instructions
> >> to serve a keep alive request.
> >>
> >
> > I've seen the brigade_puts overhead in my testing, too...and it
> > is definitely surprising, since the code is relatively minimal.
> > The only obvious (potential) speedup I can think of would be to
> > replace the char-at-a-time loop with a memcpy (with checks to
> > make sure it doesn't overflow the available size).  I'll try this
> > over the weekend.
>
>
> I remembered why memcpy won't help here: we don't know the
> length in advance.  But I managed to speed up apr_brigade_puts()
> by about 30% in my tests by optimizing its main loop.  Does this
> patch reduce the apr_brigade_puts() overhead in your test environment?
>
> --Brian

Haven't had a chance to profile it yet, but the patch seems to provide a 1
to 2% improvement in throughput serving a 500 byte file out of
mod_mem_cache.

Bill


Re: [PATCH] Re: 2.0 performance Re: Breaking something? Now is the time?

Posted by Brian Pane <bp...@pacbell.net>.
Cliff Woolley wrote:

>On Sat, 29 Jun 2002, Brian Pane wrote:
>
>  
>
>>I tried this, and it didn't unroll the loop.  That's probably
>>because some of information needed to unroll the loop effectively
>>is unknown to the compiler.
>>    
>>
>
>Hm.  Okay, well, if we're going to do this, can we split it out into a
>separate macro (my_strncpy or something) so it's clear what's going on and
>to avoid cluttering up that function?
>

I have reservations about making it a macro, because that would
confuse debuggers and any profilers that do basic-block or line-level
profiling.

I also don't mind cluttering the function.  The value of
a low-level I/O API like apr_brigade_puts() is that it hides
ugly bufer management details from application code, so that
the application code can stay clean and simple.

--Brian



Re: [PATCH] Re: 2.0 performance Re: Breaking something? Now is the time?

Posted by Cliff Woolley <jw...@virginia.edu>.
On Sat, 29 Jun 2002, Cliff Woolley wrote:

> Also, isn't it true that your patch now causes the buffer bucket to always
> have 0-7 unused bytes at the end?

Oh duh, nevermind on this point, my fault.  I misread the loop condition.

--Cliff


Re: [PATCH] Re: 2.0 performance Re: Breaking something? Now is the time?

Posted by Brian Pane <bp...@pacbell.net>.
Cliff Woolley wrote:

>On Sat, 29 Jun 2002, Cliff Woolley wrote:
>
>  
>
>>some way that would allow us to coalesce the writes.
>>    
>>
>
>Alignment issues would kill us here, aren't they?  That sucks.  Grrrr.....
>  
>

We might be able to get some additional improvements by
doing word-at-a-time operations for half of the copy operation:
  - start with the current byte-at-a-time loop
  - as soon as "buf" points to a word-aligned address,
    switch to a mode in which we grab the next sizeof(int)
    bytes from the input string, pack them into an int
    (with ifdef'ed code for big- and little-endian machines),
    and write the int to the target address.

But that might or might not actually be faster (we'd be doing
more instructions in order to do fewer memory writes).  And
it's more complicated than the unrolled-loop code, of course.
So for now, I'll stick with the unrolled-loop implementation,
since it's showing good results in benchmark testing.

--Brian



Re: [PATCH] Re: 2.0 performance Re: Breaking something? Now is the time?

Posted by Ben Laurie <be...@algroup.co.uk>.
Cliff Woolley wrote:
> On Sat, 29 Jun 2002, Cliff Woolley wrote:
> 
> 
>>some way that would allow us to coalesce the writes.
> 
> 
> Alignment issues would kill us here, aren't they?  That sucks.  Grrrr.....

Depends on the CPU, but if you are feeling energetic you can also align 
the copies. The problem is detecting the EOS.

Cheers,

Ben.

-- 
http://www.apache-ssl.org/ben.html       http://www.thebunker.net/

"There is no limit to what a man can do or how far he can go if he
doesn't mind who gets the credit." - Robert Woodruff


Re: [PATCH] Re: 2.0 performance Re: Breaking something? Now is the time?

Posted by Cliff Woolley <jw...@virginia.edu>.
On Sat, 29 Jun 2002, Cliff Woolley wrote:

> some way that would allow us to coalesce the writes.

Alignment issues would kill us here, aren't they?  That sucks.  Grrrr.....


Re: [PATCH] Re: 2.0 performance Re: Breaking something? Now is the time?

Posted by Cliff Woolley <jw...@virginia.edu>.
On Sat, 29 Jun 2002, Brian Pane wrote:

> I tried this, and it didn't unroll the loop.  That's probably
> because some of information needed to unroll the loop effectively
> is unknown to the compiler.

Hm.  Okay, well, if we're going to do this, can we split it out into a
separate macro (my_strncpy or something) so it's clear what's going on and
to avoid cluttering up that function?

Also, isn't it true that your patch now causes the buffer bucket to always
have 0-7 unused bytes at the end?  I'd have to go back and look more
carefully to be sure, but that was the impression I got from first glance.

I also feel like there *has* to be some better way to check for EOS...
some way that would allow us to coalesce the writes.  But I haven't
figured out what that is yet.  I'll keep thinking about it.

--Cliff


Re: [PATCH] Re: 2.0 performance Re: Breaking something? Now is the time?

Posted by Brian Pane <bp...@pacbell.net>.
Roy T. Fielding wrote:

> A better optimization might be to reduce the number of calls to
> brigade_puts.  That's how much of 1.3 was improved.


I only know of three ways to reduce the number of apr_brigade_puts()
calls in 2.0:

  * Send fewer fields in the HTTP response header.

  * Or do more buffering prior to calling apr_brigade_puts().
    (This is what 2.0 used to do, and it was even slower, because
    it added yet another layer of memory copying before the socket
    write.)

  * Or produce a separate bucket for each field in the response
    header, and rely on writev to patch them together.
    (This won't work in 2.0; if the number of tiny buckets
    grows too large, core_output_filter() will try to consolidate
    them into a single bucket, with the associated memcpy cost.)

Were you thinking of a different approach from these?

--Brian




Re: [PATCH] Re: 2.0 performance Re: Breaking something? Now is the time?

Posted by "Roy T. Fielding" <fi...@apache.org>.
A better optimization might be to reduce the number of calls to
brigade_puts.  That's how much of 1.3 was improved.

....Roy


Re: [PATCH] Re: 2.0 performance Re: Breaking something? Now is the time?

Posted by Brian Pane <bp...@pacbell.net>.
Cliff Woolley wrote:

>On Fri, 28 Jun 2002, Brian Pane wrote:
>
>  
>
>>I remembered why memcpy won't help here: we don't know the
>>length in advance.  But I managed to speed up apr_brigade_puts()
>>by about 30% in my tests by optimizing its main loop.  Does this
>>patch reduce the apr_brigade_puts() overhead in your test environment?
>>    
>>
>
>Why won't the compiler unroll this loop for you?
>
>gcc -O3 -funroll-loops
>

I tried this, and it didn't unroll the loop.  That's probably
because some of information needed to unroll the loop effectively
is unknown to the compiler.  The condition for continuing this
loop is: 1) not at the end of the input string, and 2) not at
the end of the target bucket.  We have a "lookahead" capability
on the second condition, but not on the first one.  I.e., we know
how many more bytes remain in the target bucket, and thus we can
unroll the loop into blocks of 'n' character-copy operations with
a check for 'n' available bytes of writable buffer space only
once per iteration.  (We also know that, for small values of 'n',
there are almost always more than 'n' bytes left in the bucket,
so that we can actually take advantage of this optimization in
the real world.)  In contrast, the check for end-of-string
can't be unrolled very effectively: there's no way to avoid
having to put a conditional branch in front of every "*buf++=*str++"
operation.  Thus the patch unrolls the loop in a way that reduces
the number of end-of-bucket checks, even though it's impossible to
reduce the number of end-of-string checks.

--Brian



Re: [PATCH] Re: 2.0 performance Re: Breaking something? Now is the time?

Posted by Cliff Woolley <jw...@virginia.edu>.
On Fri, 28 Jun 2002, Brian Pane wrote:

> I remembered why memcpy won't help here: we don't know the
> length in advance.  But I managed to speed up apr_brigade_puts()
> by about 30% in my tests by optimizing its main loop.  Does this
> patch reduce the apr_brigade_puts() overhead in your test environment?

Why won't the compiler unroll this loop for you?

gcc -O3 -funroll-loops

--Cliff


[PATCH] Re: 2.0 performance Re: Breaking something? Now is the time?

Posted by Brian Pane <bp...@pacbell.net>.
Brian Pane wrote:

> Bill Stoddard wrote:
> ...


>> The time spent in ap_brigade_puts is
>> suprising...  This particular run indicate that it tool 74355 
>> instructions
>> to serve a keep alive request.
>>
>
> I've seen the brigade_puts overhead in my testing, too...and it
> is definitely surprising, since the code is relatively minimal.
> The only obvious (potential) speedup I can think of would be to
> replace the char-at-a-time loop with a memcpy (with checks to
> make sure it doesn't overflow the available size).  I'll try this
> over the weekend. 


I remembered why memcpy won't help here: we don't know the
length in advance.  But I managed to speed up apr_brigade_puts()
by about 30% in my tests by optimizing its main loop.  Does this
patch reduce the apr_brigade_puts() overhead in your test environment?

--Brian


Re: 2.0 performance Re: Breaking something? Now is the time?

Posted by Brian Pane <br...@cnet.com>.
Bill Stoddard wrote:
...

>mod_mem_cache bypasses most all of those things. Using mod_mem_cache to
>cache a buffer in heap (contents of a 500 byte file).  I have even hack the
>code slightly to turn off the pipelined request optimization. I am playing
>with a new tool so I don't fully grok the output just yet but here is what I
>see serving a single keep alive request out of mod_mem_cache.
>

Thanks for the profile data!

> The divisions are
>real killers (at least on this machine). Suggests that we need a call,
>apr_time_now_sec() to fetch the time with seconds resolution (configurable).
>Now we do multiplications to convert to uS then do divisions to convert back
>to seconds in a number of routines.
>

I've seen the rem and div code taking a long time in 32-bit mode
on Sparcs, too.  At last count, we have at least three proposed
solutions: a new API that returns seconds, converting the 64-bit
int to a struct with separate seconds and usec, and Will Rowe's
binary microseconds proposal.  Any of them would be an improvement.
I'll add a call for votes on these to APR's STATUS file.

>The time spent in ap_brigade_puts is
>suprising...  This particular run indicate that it tool 74355 instructions
>to serve a keep alive request.
>

I've seen the brigade_puts overhead in my testing, too...and it
is definitely surprising, since the code is relatively minimal.
The only obvious (potential) speedup I can think of would be to
replace the char-at-a-time loop with a memcpy (with checks to
make sure it doesn't overflow the available size).  I'll try this
over the weekend.

--Brian



RE: 2.0 performance Re: Breaking something? Now is the time?

Posted by Bill Stoddard <bi...@wstoddard.com>.
> Bill Stoddard wrote:
>
> >Yes, please, we need some performance measurements.  I've been doing some
> >profiling of Apache 2.0 on AIX and even with mod_mem_cache, we
> still serve
> >static files with keep-alive at about half the rate of iPlanet. The sad
> >thing is I don't see any single smoking guns. Just lots of little stuff
> >everywhere.
> >
> >
>
> Indeed, all the big problems have been fixed, and what remains is a long
> list of small things to optimize.  My list includes:
>
>   * the buffering of keepalive responses < 8KB (which turns sendfile
>     operations into mmap+memcpy)
>   * lots of string operations in directory_walk/location_walk/file_walk
>   * the code that creates and destroys a temporary brigade for each line
>     in order to read the request header
>   * our memory usage is a bit higher than it probably should be
>   * regex comparisons in file_walk and mod_setenvif
>   * mod_mime's find_ct() does too much string manipulation
>   * apr_table_get (even with all the optimizations that we've
> already done)
>
> Do you have additional things that you've found in your profiling?
>
> Thanks,
> --Brian

mod_mem_cache bypasses most all of those things. Using mod_mem_cache to
cache a buffer in heap (contents of a 500 byte file).  I have even hack the
code slightly to turn off the pipelined request optimization. I am playing
with a new tool so I don't fully grok the output just yet but here is what I
see serving a single keep alive request out of mod_mem_cache.

All modules are DSOs (and the link overhead shows this). The divisions are
real killers (at least on this machine). Suggests that we need a call,
apr_time_now_sec() to fetch the time with seconds resolution (configurable).
Now we do multiplications to convert to uS then do divisions to convert back
to seconds in a number of routines. The time spent in ap_brigade_puts is
suprising...  This particular run indicate that it tool 74355 instructions
to serve a keep alive request. This profile is taken on unmodified Apache
2.0 code. Well almost... the avoid_xlc_bug is a hack to get around a bug in
the xlc optimizer.  I'll gladly post the hack to the list if you want to see
it. The density of RING macros in the core_input_filter f*cks up the xlc
optimizer...

Space                      %    Ticks
=====                     ====  =====
User                      15.2  11300
Shared Library            56.5  42028
Kernel                    28.3  21027

Total                           74355 (Footprint = 84.523 KB)

Pid   Process Name         %    Ticks
===== ============        ====  =====
23454 httpd              100.0  74355

Tid   Process Name         %    Ticks
===== ============        ====  =====
49159 httpd              100.0  74355


./httpd :

Subroutine Name          Source File                  Visit Enter  %   Ticks
===============          ===========                  ===== ===== ==== =====
.core_input_filter       core.c                       51    11     1.6 1165
.ap_rgetline_core        protocol.c                   66    6      1.4 1061
.net_time_filter         core.c                       36    12     0.8 615
.core_output_filter      core.c                       17    3      0.7 520
.add_any_filter_handle   util_filter.c                12    6      0.6 478
._ptrgl                  ptrgl.s                      74    74     0.6 444
.form_header_field       http_protocol.c              50    10     0.5 390
.ap_get_brigade          util_filter.c                50    25     0.4 325
.fix_hostname            vhost.c                      2     1      0.3 259
.add_any_filter          util_filter.c                1     1      0.3 257
.ap_getword_white        util.c                       28    2      0.3 254
.ap_find_token           util.c                       5     3      0.3 229
.ap_get_mime_headers     protocol.c                   16    1      0.3 228
._moveeq                 moveeq.s                     14    14     0.3 217
.pcre_exec               pcre.c                       4     1      0.3 209
.ap_recent_rfc822_date   util_time.c                  15    1      0.3 190
.match                   pcre.c                       1     1      0.2 181
.ap_pass_brigade         util_filter.c                14    7      0.2 169
.ap_set_keepalive        http_protocol.c              12    1      0.2 152
.apr_brigade_write       glink.s                      25    25     0.2 150
.ap_content_length_filter protocol.c                   6     1      0.2 145
.ap_read_request         protocol.c                   18    1      0.2 132
.ap_http_header_filter   http_protocol.c              15    1      0.2 132
.apr_palloc              glink.s                      22    22     0.2 132
.isspace                 glink.s                      22    22     0.2 132
.read_request_line       protocol.c                   11    1      0.2 121
.log_error_core          log.c                        4     4      0.2 120
.apr_brigade_puts        glink.s                      20    20     0.2 120
.apr_table_get           glink.s                      19    19     0.2 114
.apr_brigade_create      glink.s                      17    17     0.1 102
.ap_http_filter          http_protocol.c              8     2      0.1 99
.ap_log_error            log.c                        8     4      0.1 88
.basic_http_header       http_protocol.c              13    1      0.1 84
.ap_update_child_status_from_indexes scoreboard.c                 2     2
0.1 80
.ap_make_content_type    protocol.c                   7     1      0.1 78
.apr_brigade_destroy     glink.s                      12    12     0.1 72
.ap_save_brigade         util_filter.c                3     1      0.1 70
.ap_discard_request_body http_protocol.c              4     1      0.1 68
.ap_meets_conditions     http_protocol.c              8     1      0.1 67
.apr_setsocketopt        glink.s                      11    11     0.1 66
.cached_explode          util_time.c                  5     1      0.1 65
.regexec                 pcreposix.c                  3     1      0.1 60
.core_create_req         core.c                       5     1      0.1 58
.ap_get_remote_host      core.c                       1     1      0.1 56
.remove_any_filter       util_filter.c                3     3      0.1 48
.strlen                  glink.s                      8     8      0.1 48
.check_pipeline_flush    http_request.c               5     1      0.1 45
.writev_it_all           core.c                       2     1      0.1 45
.ap_run_create_request   request.c                    3     1      0.1 45
.avoid_xlc_bug           core.c                       11    11     0.1 44
.ap_parse_uri            protocol.c                   3     1      0.1 44
.basic_http_header_check http_protocol.c              3     1      0.1 41
.ap_run_insert_filter    request.c                    3     1      0.1 38
.http_create_request     http_core.c                  4     1      0.1 38
.ap_process_http_connection http_core.c                  5     0      0.1 38
.terminate_header        http_protocol.c              4     1      0.1 38
.apr_table_make          glink.s                      6     6      0.0 36
.apr_brigade_split_line  glink.s                      6     6      0.0 36
.ap_run_post_read_request protocol.c                   2     1      0.0 34
.ap_run_log_transaction  protocol.c                   2     1      0.0 33
.core_insert_filter      core.c                       1     1      0.0 32
.ap_set_byterange        http_protocol.c              3     1      0.0 32
.ap_update_vhost_from_headers vhost.c                      3     1      0.0
31
.fixup_vary              http_protocol.c              3     1      0.0 30
.apr_brigade_length      glink.s                      5     5      0.0 30
.ap_remove_output_filter util_filter.c                3     3      0.0 30
.ap_byterange_filter     http_protocol.c              4     1      0.0 29
.ap_run_quick_handler    config.c                     2     1      0.0 28
.lookup_builtin_method   http_protocol.c              1     1      0.0 27
.ap_process_request      http_request.c               5     1      0.0 27
.ap_make_method_list     http_protocol.c              3     1      0.0 25
.apr_brigade_split       glink.s                      4     4      0.0 24
.apr_table_addn          glink.s                      4     4      0.0 24
.memset                  glink.s                      4     4      0.0 24
.strchr                  glink.s                      4     4      0.0 24
.__divi64                glink.s                      4     4      0.0 24
.apr_brigade_partition   glink.s                      4     4      0.0 24
.ap_add_output_filters_by_type core.c                       1     1      0.0
23
.create_empty_config     config.c                     3     1      0.0 21
.ap_set_content_type     http_protocol.c              2     1      0.0 21
.ap_method_number_of     http_protocol.c              3     1      0.0 20
.ap_get_remote_logname   core.c                       1     1      0.0 20
.ap_set_content_length   protocol.c                   3     1      0.0 20
.ap_index_of_response    http_protocol.c              1     1      0.0 19
.ap_finalize_request_protocol protocol.c                   2     1      0.0
19
.ap_add_output_filter_handle util_filter.c                3     3      0.0
18
.apr_table_setn          glink.s                      3     3      0.0 18
.ap_update_child_status  scoreboard.c                 2     2      0.0 14
.strncasecmp             glink.s                      2     2      0.0 12
.apr_table_unset         glink.s                      2     2      0.0 12
.apr_pstrcatv            glink.s                      2     2      0.0 12
.apr_array_make          glink.s                      2     2      0.0 12
.isdigit                 glink.s                      2     2      0.0 12
.ap_add_input_filter_handle util_filter.c                2     2      0.0 12
.apr_time_now            glink.s                      2     2      0.0 12
.apr_table_do            glink.s                      2     2      0.0 12
.ap_regexec              util.c                       2     1      0.0 9
.ap_add_output_filter    util_filter.c                1     1      0.0 8
.ap_get_limit_req_body   core.c                       1     1      0.0 7
.apr_psprintf            glink.s                      1     1      0.0 6
.apr_uri_parse           glink.s                      1     1      0.0 6
.apr_table_mergen        glink.s                      1     1      0.0 6
.apr_pstrmemdup          glink.s                      1     1      0.0 6
.apr_bucket_eos_create   glink.s                      1     1      0.0 6
.apr_table_overlap       glink.s                      1     1      0.0 6
.apr_brigade_cleanup     glink.s                      1     1      0.0 6
.apr_bucket_flush_create glink.s                      1     1      0.0 6
.apr_parse_addr_port     glink.s                      1     1      0.0 6
.apr_pool_create_ex      glink.s                      1     1      0.0 6
.apr_pstrdup             glink.s                      1     1      0.0 6
.apr_pool_destroy        glink.s                      1     1      0.0 6
.apr_sendv               glink.s                      1     1      0.0 6
.apr_off_t_toa           glink.s                      1     1      0.0 6
.ap_get_server_version   core.c                       1     1      0.0 5
.ap_explode_recent_gmt   util_time.c                  1     1      0.0 4
.ap_graceful_stop_signalled worker.c                     1     1      0.0 3
.ap_create_request_config config.c                     1     1      0.0 1

Shlib Subroutine         Source File                  Visit Enter  %   Ticks
================         ===========                  ===== ===== ==== =====
.apr_brigade_puts        apr_brigade.c                20    20     4.4 3259
.__divu64                divu64.s                     9     9      4.0 3006
.apr_palloc              apr_pools.c                  120   120    2.9 2160
.apr_table_get           apr_tables.c                 33    28     2.9 2145
.__divi64                divi64.s                     5     5      1.9 1413
.apr_brigade_write       apr_brigade.c                54    25     1.7 1262
.strlen                  strlen.s                     35    35     1.7 1260
.apr_table_setn          apr_tables.c                 27    13     1.4 1025
.__is_wctype_std         libc/__is_wctype_std.c       49    49     1.3 980
.strcasecmp              libaixinet/strcasecmp.c      10    10     1.1 829
.memset                  memset.s                     11    11     1.1 805
.apr_setsocketopt        sockopt.c                    13    13     1.0 779
._moveeq                 moveeq.s                     26    25     1.0 768
.memchr                  libc/memchr.c                11    11     1.0 718
.apr_bucket_alloc        apr_buckets_alloc.c          34    30     0.9 664
.apr_vformatter          apr_snprintf.c               8     2      0.9 663
.apr_brigade_cleanup     apr_brigade.c                59    19     0.8 632
.apr_brigade_create      apr_brigade.c                66    22     0.8 594
.apr_pool_cleanup_register apr_pools.c                  50    25     0.8 575
.apr_brigade_split_line  apr_brigade.c                32    6      0.7 534
.config_log_transaction  mod_log_config.c             47    1      0.7 525
.apr_array_push_noclear  apr_tables.c                 24    22     0.7 502
.isspace                 libc/isspace.c               46    23     0.6 460
.apr_bucket_free         apr_buckets_alloc.c          34    30     0.5 402
.isupper                 libc/isupper.c               40    20     0.5 400
._ptrgl                  ptrgl.s                      64    64     0.5 384
.make_array_core         apr_tables.c                 28    13     0.5 362
.process_item            mod_log_config.c             28    14     0.5 350
.tolower                 libc/tolower.c               40    20     0.5 340
.apr_table_unset         apr_tables.c                 3     2      0.5 339
.apr_table_vdo           apr_tables.c                 10    2      0.4 322
._ptrgl                  ptrgl.s                      53    53     0.4 318
.apr_bucket_simple_split apr_buckets_simple.c         20    10     0.4 310
.heap_bucket_read        apr_buckets_heap.c           31    31     0.4 279
.apr_palloc              glink.s                      45    45     0.4 270
.apr_bucket_simple_copy  apr_buckets_simple.c         28    14     0.4 266
._moveeq                 moveeq.s                     18    18     0.4 264
.allocator_alloc         apr_pools.c                  5     5      0.4 263
.strchr                  strchr.s                     5     5      0.4 263
.heap_bucket_destroy     apr_buckets_heap.c           40    17     0.3 259
.match_headers           mod_setenvif.c               10    1      0.3 247
.match_boyer_moore_horspool apr_strmatch.c               6     6      0.3
233
.allocator_free          apr_pools.c                  5     5      0.3 229
.apr_pool_cleanup_kill   apr_pools.c                  14    14     0.3 217
.strncasecmp             libaixinet/strcasecmp.c      3     3      0.3 216
.match_boyer_moore_horspool_nocase apr_strmatch.c               23    3
0.3 212
.apr_table_overlap       apr_tables.c                 13    1      0.3 205
.apr_brigade_destroy     apr_brigade.c                36    12     0.3 204
.conv_10                 apr_snprintf.c               17    3      0.3 193
.apr_table_make          apr_tables.c                 30    10     0.3 190
.apr_brigade_length      apr_brigade.c                5     5      0.3 190
.apr_bucket_shared_split apr_buckets_refcount.c       20    10     0.2 180
.apr_bucket_alloc        glink.s                      30    30     0.2 180
.pthread_mutex_lock      libpthreads/mutex.c          6     3      0.2 177
.apr_pstrdup             apr_strings.c                25    7      0.2 168
.time_base_to_time       libc/POWER/time_base_to_time.c 12    3      0.2 168
.overlap_hash            apr_tables.c                 4     4      0.2 168
.cache_url_handler       mod_cache.c                  19    1      0.2 165
._moveeq                 moveeq.s                     15    15     0.2 157
.apr_table_addn          apr_tables.c                 8     4      0.2 156
.apr_pstrcatv            apr_strings.c                11    2      0.2 154
.unserialize_table       mod_mem_cache.c              14    4      0.2 151
.apr_brigade_partition   apr_brigade.c                8     4      0.2 144
.find_entry              cache_hash.c                 3     1      0.2 140
.apr_pool_cleanup_register glink.s                      23    23     0.2 138
.ap_cache_check_freshness cache_util.c                 8     1      0.2 134
.apr_palloc              glink.s                      22    22     0.2 132
.apr_off_t_toa           apr_strings.c                10    2      0.2 128
.gettimeofday            libc/gettimeofday.c          12    3      0.2 126
.apr_brigade_split       apr_brigade.c                8     4      0.2 120
.pthread_mutex_unlock    libpthreads/mutex.c          6     3      0.2 120
.apr_uri_parse           apr_uri.c                    3     1      0.2 120
.tolower                 glink.s                      20    20     0.2 120
.socket_bucket_read      apr_buckets_socket.c         12    2      0.2 118
.apr_bucket_shared_destroy glink.s                      19    19     0.2 114
.apr_bucket_shared_destroy apr_buckets_refcount.c       19    19     0.2 114
.apr_table_mergen        apr_tables.c                 2     1      0.1 111
.apr_table_set           apr_tables.c                 4     1      0.1 109
.apr_pvsprintf           apr_pools.c                  4     2      0.1 98
.apr_bucket_heap_make    apr_buckets_heap.c           9     3      0.1 96
._ptrgl                  ptrgl.s                      16    16     0.1 96
.apr_recv                sendrecv.c                   7     2      0.1 95
.strlen                  glink.s                      14    14     0.1 84
.open_entity             mod_mem_cache.c              9     1      0.1 84
.read_real_time          read_real_time.s             3     3      0.1 84
._ptrgl                  ptrgl.s                      14    14     0.1 84
.apr_pool_cleanup_kill   glink.s                      13    13     0.1 78
.apr_time_now            time.c                       9     3      0.1 75
.cache_select_url        cache_storage.c              8     1      0.1 73
.read                    libc/read.c                  5     2      0.1 70
.apr_pool_create_ex      apr_pools.c                  3     1      0.1 67
.apr_bucket_shared_copy  apr_buckets_refcount.c       8     4      0.1 64
.apr_itoa                apr_strings.c                5     1      0.1 64
.run_cleanups            apr_pools.c                  8     1      0.1 62
.apr_mmap_create         mmap.c                       5     1      0.1 62
.apr_parse_addr_port     sockaddr.c                   5     1      0.1 61
.memchr                  glink.s                      10    10     0.1 60
.apr_table_setn          glink.s                      10    10     0.1 60
.islower                 libc/islower.c               6     3      0.1 60
.apr_bucket_simple_split glink.s                      10    10     0.1 60
.isdigit                 libc/isdigit.c               6     3      0.1 60
.read_headers            mod_mem_cache.c              11    1      0.1 60
.ap_cache_get_cachetype  cache_util.c                 3     1      0.1 57
._Errno                  libc/errno.c                 6     3      0.1 57
.apr_pool_destroy        apr_pools.c                  7     1      0.1 53
.apr_bucket_heap_create  apr_buckets_heap.c           6     2      0.1 52
.cache_out_filter        mod_cache.c                  6     1      0.1 50
.strcasecmp              glink.s                      8     8      0.1 48
.apr_table_get           glink.s                      8     8      0.1 48
.log_request_time        mod_log_config.c             4     1      0.1 48
.apr_thread_mutex_unlock thread_mutex.c               6     3      0.1 48
.__pthread_geterrno_addr libpthreads/lib_lock.c       4     4      0.1 48
.apr_thread_mutex_lock   thread_mutex.c               6     3      0.1 48
.read_body               mod_mem_cache.c              4     1      0.1 47
.apr_pstrmemdup          apr_strings.c                6     2      0.1 46
.apr_sendv               sendrecv.c                   2     1      0.1 46
.ap_cache_tokstr         cache_util.c                 5     1      0.1 45
.strlen                  glink.s                      7     7      0.1 42
.ap_cache_liststr        cache_util.c                 4     2      0.1 42
.apr_array_make          apr_tables.c                 6     2      0.1 40
.apr_bucket_eos_create   apr_buckets_eos.c            6     2      0.1 40
.file_make_mmap          apr_buckets_file.c           4     1      0.1 39
.spin_unlock_global_ppc_up locks_ppc_up.s               3     3      0.1 39
.ap_cache_current_age    cache_util.c                 2     1      0.1 38
.multi_log_transaction   mod_log_config.c             2     1      0.0 37
.apr_table_do            apr_tables.c                 4     2      0.0 36
.memcmp                  memcmp.s                     1     1      0.0 36
.apr_psprintf            apr_pools.c                  4     2      0.0 36
._ptrgl                  ptrgl.s                      6     6      0.0 36
.apr_bucket_mmap_make    apr_buckets_mmap.c           4     1      0.0 36
.apr_bucket_free         glink.s                      6     6      0.0 36
.apr_os_file_put         open.c                       3     1      0.0 35
.spin_lock_global_ppc_up locks_ppc_up.s               3     3      0.0 33
.file_bucket_read        apr_buckets_file.c           3     1      0.0 33
.cache_run_open_entity   cache_storage.c              2     1      0.0 32
.apr_file_write          readwrite.c                  2     1      0.0 32
.apr_pstrndup            apr_strings.c                4     1      0.0 31
.cache_update            cache_cache.c                5     1      0.0 31
.mmap_bucket_destroy     apr_buckets_mmap.c           5     1      0.0 31
.apr_atomic_dec          apr_atomic.c                 4     1      0.0 31
.memset                  glink.s                      5     5      0.0 30
.apr_bucket_file_make    apr_buckets_file.c           3     1      0.0 30
.strlen                  glink.s                      5     5      0.0 30
.apr_bucket_shared_make  apr_buckets_refcount.c       5     5      0.0 30
.apr_bucket_shared_make  glink.s                      5     5      0.0 30
.apr_atomic_inc          apr_atomic.c                 4     1      0.0 29
.apr_bucket_file_create  apr_buckets_file.c           3     1      0.0 28
.memcache_gdsf_algorithm mod_mem_cache.c              3     2      0.0 28
.write                   libc/write.c                 2     1      0.0 27
.mmap_bucket_read        apr_buckets_mmap.c           2     1      0.0 27
.decrement_refcount      mod_mem_cache.c              2     1      0.0 26
.ap_headers_insert_output_filter mod_headers.c                1     1
0.0 25
.mmap_cleanup            mmap.c                       2     1      0.0 25
.apr_palloc              glink.s                      4     4      0.0 24
.ap_log_error            glink.s                      4     4      0.0 24
._ptrgl                  ptrgl.s                      4     4      0.0 24
.apr_allocator_alloc     glink.s                      4     4      0.0 24
.apr_bucket_simple_copy  glink.s                      4     4      0.0 24
.apr_table_make          glink.s                      4     4      0.0 24
._ptrgl                  ptrgl.s                      4     4      0.0 24
.apr_allocator_free      glink.s                      4     4      0.0 24
.cache_read_entity_headers cache_storage.c              3     1      0.0 22
.cache_pq_change_priority cache_pqueue.c               3     1      0.0 21
.apr_bucket_socket_create apr_buckets_socket.c         3     1      0.0 21
.writev                  libc/write.c                 2     1      0.0 20
.apr_bucket_flush_create apr_buckets_flush.c          3     1      0.0 20
.pthread_mutex_lock      glink.s                      3     3      0.0 18
.islower                 glink.s                      3     3      0.0 18
.gettimeofday            glink.s                      3     3      0.0 18
._Errno                  glink.s                      3     3      0.0 18
.pthread_mutex_unlock    glink.s                      3     3      0.0 18
.apr_pstrdup             glink.s                      3     3      0.0 18
.cache_generate_key_default cache_storage.c              2     1      0.0 17
.cache_pq_percolate_down cache_pqueue.c               2     1      0.0 17
.minchild                cache_pqueue.c               1     1      0.0 17
.log_request_line        mod_log_config.c             1     1      0.0 16
.file_bucket_destroy     apr_buckets_file.c           3     1      0.0 15
.log_remote_user         mod_log_config.c             1     1      0.0 15
.clf_log_bytes_sent      mod_log_config.c             2     1      0.0 15
.apr_bucket_eos_make     apr_buckets_eos.c            2     2      0.0 14
.cache_hash_get          cache_hash.c                 2     1      0.0 14
.log_remote_host         mod_log_config.c             2     1      0.0 14
.free_proc_chain         apr_pools.c                  1     1      0.0 14
.constant_item           mod_log_config.c             7     7      0.0 14
.apr_mmap_delete         mmap.c                       2     1      0.0 13
.cache_find              cache_cache.c                2     1      0.0 13
.apr_thread_mutex_lock   glink.s                      2     2      0.0 12
.apr_thread_mutex_unlock glink.s                      2     2      0.0 12
.apr_pstrdup             glink.s                      2     2      0.0 12
.apr_vformatter          glink.s                      2     2      0.0 12
.kread                   glink.s                      2     2      0.0 12
.read                    glink.s                      2     2      0.0 12
.ap_cache_liststr        glink.s                      2     2      0.0 12
.apr_setsocketopt        glink.s                      2     2      0.0 12
.apr_palloc              glink.s                      2     2      0.0 12
.apr_pool_cleanup_run    apr_pools.c                  3     1      0.0 12
.apr_bucket_heap_create  glink.s                      2     2      0.0 12
.strcasecmp              glink.s                      2     2      0.0 12
.ap_pass_brigade         glink.s                      2     2      0.0 12
.apr_recv                glink.s                      2     2      0.0 12
.apr_mmap_offset         common.c                     1     1      0.0 10
.cache_read_entity_body  cache_storage.c              2     1      0.0 10
.format_integer          mod_log_config.c             2     1      0.0 9
.log_remote_logname      mod_log_config.c             2     1      0.0 9
.memcache_inc_frequency  mod_mem_cache.c              1     1      0.0 7
.apr_bucket_flush_make   apr_buckets_flush.c          1     1      0.0 7
.apr_bucket_socket_make  apr_buckets_socket.c         1     1      0.0 7
.cache_find              glink.s                      1     1      0.0 6
.ap_set_content_type     glink.s                      1     1      0.0 6
.apr_pstrdup             glink.s                      1     1      0.0 6
.apr_bucket_eos_create   glink.s                      1     1      0.0 6
.apr_bucket_file_create  glink.s                      1     1      0.0 6
.apr_os_file_put         glink.s                      1     1      0.0 6
.apr_atomic_dec          glink.s                      1     1      0.0 6
.apr_thread_mutex_unlock glink.s                      1     1      0.0 6
.apr_atomic_inc          glink.s                      1     1      0.0 6
.apr_thread_mutex_lock   glink.s                      1     1      0.0 6
.apr_pool_cleanup_register glink.s                      1     1      0.0 6
.memcmp                  glink.s                      1     1      0.0 6
.ap_cache_tokstr         glink.s                      1     1      0.0 6
.ap_cache_check_freshness glink.s                      1     1      0.0 6
.cache_select_url        glink.s                      1     1      0.0 6
.ap_cache_get_cachetype  glink.s                      1     1      0.0 6
.cache_read_entity_body  glink.s                      1     1      0.0 6
.apr_brigade_create      glink.s                      1     1      0.0 6
.ap_run_insert_filter    glink.s                      1     1      0.0 6
.ap_meets_conditions     glink.s                      1     1      0.0 6
.ap_add_output_filter    glink.s                      1     1      0.0 6
.apr_time_now            glink.s                      1     1      0.0 6
.apr_table_set           glink.s                      1     1      0.0 6
.apr_psprintf            glink.s                      1     1      0.0 6
.strncasecmp             glink.s                      1     1      0.0 6
.apr_pstrndup            glink.s                      1     1      0.0 6
.isspace                 glink.s                      1     1      0.0 6
.strchr                  glink.s                      1     1      0.0 6
.ap_remove_output_filter glink.s                      1     1      0.0 6
.memset                  glink.s                      1     1      0.0 6
.mmap                    glink.s                      1     1      0.0 6
.munmap                  glink.s                      1     1      0.0 6
.apr_getsocketopt        sockopt.c                    1     1      0.0 6
.apr_pool_cleanup_register glink.s                      1     1      0.0 6
.apr_pool_cleanup_run    glink.s                      1     1      0.0 6
.kwrite                  glink.s                      1     1      0.0 6
.kwritev                 glink.s                      1     1      0.0 6
.apr_pstrmemdup          glink.s                      1     1      0.0 6
.memset                  glink.s                      1     1      0.0 6
.brigade_cleanup         apr_brigade.c                6     6      0.0 6
.isdigit                 glink.s                      1     1      0.0 6
.apr_getsocketopt        glink.s                      1     1      0.0 6
.apr_bucket_mmap_make    glink.s                      1     1      0.0 6
.apr_bucket_heap_make    glink.s                      1     1      0.0 6
.apr_mmap_offset         glink.s                      1     1      0.0 6
.apr_mmap_delete         glink.s                      1     1      0.0 6
.apr_file_write          glink.s                      1     1      0.0 6
.__divi64                glink.s                      1     1      0.0 6
.apr_off_t_toa           glink.s                      1     1      0.0 6
.memchr                  glink.s                      1     1      0.0 6
.apr_mmap_create         glink.s                      1     1      0.0 6
.ap_get_remote_logname   glink.s                      1     1      0.0 6
.cache_hash_get          glink.s                      1     1      0.0 6
.ap_get_remote_host      glink.s                      1     1      0.0 6
.apr_itoa                glink.s                      1     1      0.0 6
.cache_pq_change_priority glink.s                      1     1      0.0 6
.write                   glink.s                      1     1      0.0 6
.writev                  glink.s                      1     1      0.0 6
.cache_update            glink.s                      1     1      0.0 6
.ap_regexec              glink.s                      1     1      0.0 6
.apr_table_get           glink.s                      1     1      0.0 6
.strlen                  glink.s                      1     1      0.0 6
.apr_allocator_alloc     apr_pools.c                  4     4      0.0 4
.apr_allocator_free      apr_pools.c                  4     4      0.0 4
.log_status              mod_log_config.c             1     1      0.0 4
.apr_allocator_owner_get apr_pools.c                  2     2      0.0 4
.eos_bucket_read         apr_buckets_eos.c            1     1      0.0 4
.apr_allocator_mutex_get apr_pools.c                  2     2      0.0 4
.apr_bucket_destroy_noop apr_buckets.c                3     3      0.0 3
.memcache_get_pos        mod_mem_cache.c              1     1      0.0 3
.pfmt                    mod_log_config.c             1     1      0.0 3
.apr_bucket_setaside_noop apr_buckets.c                1     1      0.0 2


2.0 performance Re: Breaking something? Now is the time?

Posted by Brian Pane <br...@cnet.com>.
[moving to dev@httpd due to shifting topic]

Bill Stoddard wrote:

>Yes, please, we need some performance measurements.  I've been doing some
>profiling of Apache 2.0 on AIX and even with mod_mem_cache, we still serve
>static files with keep-alive at about half the rate of iPlanet. The sad
>thing is I don't see any single smoking guns. Just lots of little stuff
>everywhere.
>  
>

Indeed, all the big problems have been fixed, and what remains is a long
list of small things to optimize.  My list includes:

  * the buffering of keepalive responses < 8KB (which turns sendfile
    operations into mmap+memcpy)
  * lots of string operations in directory_walk/location_walk/file_walk
  * the code that creates and destroys a temporary brigade for each line
    in order to read the request header
  * our memory usage is a bit higher than it probably should be
  * regex comparisons in file_walk and mod_setenvif
  * mod_mime's find_ct() does too much string manipulation
  * apr_table_get (even with all the optimizations that we've already done)

Do you have additional things that you've found in your profiling?

Thanks,
--Brian




RE: Breaking something? Now is the time?

Posted by Bill Stoddard <bi...@wstoddard.com>.
> On Fri, Jun 28, 2002 at 12:11:09PM -0700, Brian Pane wrote:
> > I want to break something: binary compatibility for the pool API.
> >
> > This has been on my list for a long time, but I haven't yet had
> > time to implement it.
> >
> > What I'm thinking of is the following:
> >
> > * Preface the apr_pool_t structure with a set of function
> >   pointers for the pool's "methods": alloc, free, destroy,
> >   create subpool, etc.
>
> Sounds like SMS.  We could never overcome speed limitations and we
> always seemed to place blame on the function pointers as the reason
> why the SMS performance wasn't as good as pools.
>
> I'd want to see performance metrics saying that we aren't going to
> see a massive performance decrease with this.  -- justin
>

Yes, please, we need some performance measurements.  I've been doing some
profiling of Apache 2.0 on AIX and even with mod_mem_cache, we still serve
static files with keep-alive at about half the rate of iPlanet. The sad
thing is I don't see any single smoking guns. Just lots of little stuff
everywhere.

Bill


Re: Breaking something? Now is the time?

Posted by Justin Erenkrantz <je...@apache.org>.
On Fri, Jun 28, 2002 at 12:11:09PM -0700, Brian Pane wrote:
> I want to break something: binary compatibility for the pool API.
> 
> This has been on my list for a long time, but I haven't yet had
> time to implement it.
> 
> What I'm thinking of is the following:
> 
> * Preface the apr_pool_t structure with a set of function
>   pointers for the pool's "methods": alloc, free, destroy,
>   create subpool, etc.

Sounds like SMS.  We could never overcome speed limitations and we
always seemed to place blame on the function pointers as the reason
why the SMS performance wasn't as good as pools.  

I'd want to see performance metrics saying that we aren't going to
see a massive performance decrease with this.  -- justin

Re: Breaking something? Now is the time?

Posted by Brian Pane <br...@cnet.com>.
I want to break something: binary compatibility for the pool API.

This has been on my list for a long time, but I haven't yet had
time to implement it.

What I'm thinking of is the following:

* Preface the apr_pool_t structure with a set of function
  pointers for the pool's "methods": alloc, free, destroy,
  create subpool, etc.

* Replace the current pool functions with macros that call
  the right method for a given pool:
    #define apr_palloc(p, size)  (*(p->alloc_fn))(p, size)
  (The point of using macro for this is to avoid the performance
  impact of adding another function call per apr_palloc.)

This will let us introduce new pool variants, like reaps for
example, without requiring changes to either the pool framework
or anyone's application code.

--Brian



Re: Breaking something? Now is the time?

Posted by Cliff Woolley <jw...@virginia.edu>.
On Fri, 28 Jun 2002, William A. Rowe, Jr. wrote:

> If Cliff wants to commit the semantic change to apr_table_[v]do, I'll
> +1 here and raise you a _NONSTD correction.  Along with Sander's
> changes to make the unsafe transparent apr_allocator.h structure
> opaque, I'd say we have a bit of worthwhile breakage to inflict before
> we go on. By the way, 99.5% of coders will be unaffected by any of
> these three changes. They can take advantage of the apr_table_[v]do
> change or ignore it.

So you didn't indicate an opinion on whether the existing semantics of
apr_table_vdo() match their documentation, and if not, whether it's the
docs or the implementation that have it right.  I need to know in order to
proceed with the return-type change.

Thanks...

--Cliff


Re: Breaking something? Now is the time?

Posted by Cliff Woolley <jw...@virginia.edu>.
On Fri, 28 Jun 2002, William A. Rowe, Jr. wrote:

> I have one bit that must be broken before 1.0, and cannot be remedied easily.
> I'd like to simply break these things before Apache 2.0.40 is tagged.

+1 on all counts.  2.0.40 will already require a full recompile anyway.
Other users of APR must understand that some things must be fixed prior to
APR 1.0, and these are they.

--Cliff