You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by gr...@apache.org on 2004/01/16 19:02:01 UTC

[PATCH] block speed up

I'm interested in doing some scalability testing with worker on Linux to see 
what the O(1) scheduler and new pthread library buys us, and what happens with 
different values for ThreadsPerChild.  I decided to use a simple handler that 
just nanosleep()s for a variable amount of time controlled by the client.  I ran 
oprofile and saw that the kernel's directory lookup function was the top CPU 
eater.  An strace revealed that we were doing a directory walk, resulting in a 
stat() and 7 lstat()s per request.

My config to enable the handler is

<Location /silly>
         SetHandler s2-handler
</Location>

...which is the same way we enable mod_status and mod_info.  The key thing here 
is that the URIs to access a Location enabled handler do not map to the 
filesystem, so the directory walk is a waste of cycles.  So what can we do about 
it?  Maybe detect that the current URI matches such a block, and skip the 
filesystem mapping if so.  That's exactly what this patch does.

Throughput measured by ab increases from 3053 requests/sec to 3984 with this 
patch, slightly more than 30% (dual 400MHz PII, both pegged, 256M RAM, RH 7.3). 
  This should make it easier to see how the OS changes etc affect our scalability.

I really don't grok how the cache works in ap_location_walk so I wouldn't be 
shocked if I haven't implemented this optimally.  But it seems to do the Right 
Thing with the nanosleep handler, mod_status, and static files.  I also 
confirmed that we do get a full directory walk if I type in a garbage URL.

Comments?
Greg

Re: [PATCH] block speed up

Posted by gr...@apache.org.
Brian Akins wrote:
> How "wrong" would it be to have a map_to_storage that simply filled out 
> r->finfo and returned OK (or the appropriate error) always?  IE, no 
> directory or file walk.  From what I understand, Directory and File 
> directives would be useless, but Location would still work.  Is this 
> correct?

When the URI represents a file, the directory walk fills in r->finfo fairly 
efficiently  - just one stat() typically followed by an open() from the handler. 
  I'd like to see that change to an open() - fstat() sequence to be even more 
efficient, but that's a different topic.

What I'm trying to do here is deal with the case where the URI is known to be 
completely outside the filesystem, like server-status for example.  r->finfo is 
meaningless in that case - there is no file as far as the core code can tell.

Greg


Re: [PATCH] block speed up

Posted by Brian Akins <ba...@web.turner.com>.
How "wrong" would it be to have a map_to_storage that simply filled out 
r->finfo and returned OK (or the appropriate error) always?  IE, no 
directory or file walk.  From what I understand, Directory and File 
directives would be useless, but Location would still work.  Is this 
correct?



-- 
Brian Akins
Senior Systems Engineer
CNN Internet Technologies

Re: [PATCH] block speed up

Posted by gr...@apache.org.
gregames@apache.org wrote:
> Bill Stoddard wrote:
> 
>> WHat happens if the handler DECLINES the request 
> 
> 
> My guess is that the default handler will be called, try to open a 
> non-existant file and send back a 404.  I'll find out.
> 
>> and do we care?
> 
> 
> If it seg faults, violates protocol or something similar, I care.  Other 
> than that I don't think so.
> 
> mod_status and mod_info both are enabled via <Location > containers.  
> mod_status never DECLINEs if it is the handler.  mod_info DECLINEs if 
> the method isn't GET.  Let me see what happens if I send mod_info some 
> other method.

I sent mod_info a POST and got back a 404 with a "File does not exist" in the 
error log, both with and without this patch.

Greg


Re: [PATCH] block speed up

Posted by gr...@apache.org.
Geoffrey Young wrote:
> Bill Stoddard wrote:

>>My not so well formed thoughts are that if a module claims it should
>>handle a request based on a SetHandler directive in a Location
>>directive, the server should not allow that handler to DECLINE the
>>request. Putting it another way, if the handler claims the request then
>>DECLINEs, the server should prevent other handlers from attempting to
>>serve the request.
> 
> 
> I've brought this up before, but if you shortcut translation, how do you
> handle cases where r->handler is set after translation/map_to_storage.

I'd rather keep it simple and not change how those cases are processed.

> that is, just because at the start of a request <Location /foo> refers to a
> non-static resource, doesn't mean that assumption won't change.

<Location /foo> wasn't intended to be used for static files.  See 
http://httpd.apache.org/docs-2.0/mod/core.html#location.  If somebody wants to 
do something that strange, they can write their own map_to_storage function.

> gregames@apache.org wrote:
> 
>>A map_to_storage hook needs to know when to allow the directory walk and
>>when not to.
> 
> 
> I think I see what you mean.  so, perhaps the answer to my question above is
> that, with your patch, if modules expect to set r->handler under certain
> conditions, then they ought to install their own map_to_storage handler to
> handle those conditions as well?

yeah, that's my take.  If the module is sophisticated enough to know when to set 
r->handler without the core's help, I would assume it's sophisticated enough to 
have a custom map_to_storage function which knows when it is appropriate to 
alter normal processing.

Greg


Re: [PATCH] block speed up

Posted by Geoffrey Young <ge...@modperlcookbook.org>.
Bill Stoddard wrote:
>> mod_status and mod_info both are enabled via <Location > containers. 
>> mod_status never DECLINEs if it is the handler.  mod_info DECLINEs if
>> the method isn't GET.  Let me see what happens if I send mod_info some
>> other method.
> 
> 
> My not so well formed thoughts are that if a module claims it should
> handle a request based on a SetHandler directive in a Location
> directive, the server should not allow that handler to DECLINE the
> request. Putting it another way, if the handler claims the request then
> DECLINEs, the server should prevent other handlers from attempting to
> serve the request.

I've brought this up before, but if you shortcut translation, how do you
handle cases where r->handler is set after translation/map_to_storage.
that is, just because at the start of a request <Location /foo> refers to a
non-static resource, doesn't mean that assumption won't change.  see mod_dav
and mod_include, both of which set r->handler during the fixup phase.

gregames@apache.org wrote:
> A map_to_storage hook needs to know when to allow the directory walk and
> when not to.

I think I see what you mean.  so, perhaps the answer to my question above is
that, with your patch, if modules expect to set r->handler under certain
conditions, then they ought to install their own map_to_storage handler to
handle those conditions as well?

--Geoff


Re: [PATCH] block speed up

Posted by gr...@apache.org.
Bill Stoddard wrote:

>>> WHat happens if the handler DECLINES the request 
>>
>>
>>
>> My guess is that the default handler will be called, try to open a 
>> non-existant file and send back a 404.  I'll find out.
>>
> Make sure file 'silly' exists in documentroot and make sure it has 
> access protections coded in a Directory </> container :-)

oh my poor head!  I'm not too worried about people who do such things.

>> mod_status and mod_info both are enabled via <Location > containers.  
>> mod_status never DECLINEs if it is the handler.  mod_info DECLINEs if 
>> the method isn't GET.  Let me see what happens if I send mod_info some 
>> other method.
> 
> 
> My not so well formed thoughts are that if a module claims it should 
> handle a request based on a SetHandler directive in a Location 
> directive, the server should not allow that handler to DECLINE the 
> request. Putting it another way, if the handler claims the request then 
> DECLINEs, the server should prevent other handlers from attempting to 
> serve the request.

hmmm, that might simplify some potentially weird cases.

Greg


Re: [PATCH] block speed up

Posted by Bill Stoddard <bi...@wstoddard.com>.
gregames@apache.org wrote:

> Bill Stoddard wrote:
> 
>> WHat happens if the handler DECLINES the request 
> 
> 
> My guess is that the default handler will be called, try to open a 
> non-existant file and send back a 404.  I'll find out.
> 
Make sure file 'silly' exists in documentroot and make sure it has access protections coded in a Directory </> 
container :-)

>> and do we care?
> 
> 
> If it seg faults, violates protocol or something similar, I care.  Other 
> than that I don't think so.
> 
> mod_status and mod_info both are enabled via <Location > containers.  
> mod_status never DECLINEs if it is the handler.  mod_info DECLINEs if 
> the method isn't GET.  Let me see what happens if I send mod_info some 
> other method.

My not so well formed thoughts are that if a module claims it should handle a request based on a SetHandler 
directive in a Location directive, the server should not allow that handler to DECLINE the request. Putting it 
another way, if the handler claims the request then DECLINEs, the server should prevent other handlers from 
attempting to serve the request.

Bill


Re: [PATCH] block speed up

Posted by gr...@apache.org.
Bill Stoddard wrote:
> WHat happens if the handler DECLINES the request 

My guess is that the default handler will be called, try to open a non-existant 
file and send back a 404.  I'll find out.

> and do we care?

If it seg faults, violates protocol or something similar, I care.  Other than 
that I don't think so.

mod_status and mod_info both are enabled via <Location > containers.  mod_status 
never DECLINEs if it is the handler.  mod_info DECLINEs if the method isn't GET. 
  Let me see what happens if I send mod_info some other method.

Greg


Re: [PATCH] block speed up

Posted by Bill Stoddard <bi...@wstoddard.com>.
WHat happens if the handler DECLINES the request and do we care?

Bill

gregames@apache.org wrote:

> I'm interested in doing some scalability testing with worker on Linux to 
> see what the O(1) scheduler and new pthread library buys us, and what 
> happens with different values for ThreadsPerChild.  I decided to use a 
> simple handler that just nanosleep()s for a variable amount of time 
> controlled by the client.  I ran oprofile and saw that the kernel's 
> directory lookup function was the top CPU eater.  An strace revealed 
> that we were doing a directory walk, resulting in a stat() and 7 
> lstat()s per request.
> 
> My config to enable the handler is
> 
> <Location /silly>
>         SetHandler s2-handler
> </Location>
> 
> ...which is the same way we enable mod_status and mod_info.  The key 
> thing here is that the URIs to access a Location enabled handler do not 
> map to the filesystem, so the directory walk is a waste of cycles.  So 
> what can we do about it?  Maybe detect that the current URI matches such 
> a block, and skip the filesystem mapping if so.  That's exactly what 
> this patch does.
> 
> Throughput measured by ab increases from 3053 requests/sec to 3984 with 
> this patch, slightly more than 30% (dual 400MHz PII, both pegged, 256M 
> RAM, RH 7.3).  This should make it easier to see how the OS changes etc 
> affect our scalability.
> 
> I really don't grok how the cache works in ap_location_walk so I 
> wouldn't be shocked if I haven't implemented this optimally.  But it 
> seems to do the Right Thing with the nanosleep handler, mod_status, and 
> static files.  I also confirmed that we do get a full directory walk if 
> I type in a garbage URL.
> 
> Comments?
> Greg
> 
> 
> ------------------------------------------------------------------------
> 
> Index: include/http_core.h
> ===================================================================
> RCS file: /home/cvs/httpd-2.0/include/http_core.h,v
> retrieving revision 1.77
> diff -u -d -b -r1.77 http_core.h
> --- include/http_core.h	1 Jan 2004 13:26:16 -0000	1.77
> +++ include/http_core.h	14 Jan 2004 22:48:09 -0000
> @@ -367,6 +367,8 @@
>       * won't actually be delivered as the response for the non-GET method.
>       */
>      int deliver_script;
> +    /* The URI matches a <Location > block containing a SetHandler */
> +    int uri_outside_filesystem;
>  } core_request_config;
>  
>  /* Standard entries that are guaranteed to be accessible via
> Index: server/core.c
> ===================================================================
> RCS file: /home/cvs/httpd-2.0/server/core.c,v
> retrieving revision 1.256
> diff -u -d -b -r1.256 core.c
> --- server/core.c	8 Jan 2004 18:56:20 -0000	1.256
> +++ server/core.c	14 Jan 2004 22:48:09 -0000
> @@ -3349,6 +3349,9 @@
>  static int core_map_to_storage(request_rec *r)
>  {
>      int access_status;
> +    core_request_config *r_conf = ap_get_module_config(r->request_config,
> +                                                       &core_module);
> +    if (!r_conf->uri_outside_filesystem) {
>  
>      if ((access_status = ap_directory_walk(r))) {
>          return access_status;
> @@ -3356,6 +3359,7 @@
>  
>      if ((access_status = ap_file_walk(r))) {
>          return access_status;
> +    }
>      }
>  
>      return OK;
> Index: server/request.c
> ===================================================================
> RCS file: /home/cvs/httpd-2.0/server/request.c,v
> retrieving revision 1.132
> diff -u -d -b -r1.132 request.c
> --- server/request.c	1 Jan 2004 13:26:23 -0000	1.132
> +++ server/request.c	14 Jan 2004 22:48:09 -0000
> @@ -1330,9 +1330,22 @@
>       * and note the end result to (potentially) skip this step next time.
>       */
>      if (now_merged) {
> +        core_dir_config *core_dir_conf = ap_get_module_config(now_merged,
> +                                                              &core_module);
> +        core_request_config *r_conf = ap_get_module_config(r->request_config,
> +                                                           &core_module);
> +
>          r->per_dir_config = ap_merge_per_dir_configs(r->pool,
>                                                       r->per_dir_config,
>                                                       now_merged);
> +        if (core_dir_conf->handler && strcmp(core_dir_conf->handler,
> +                                             "none")) {
> +            /* this location block has a handler.  A pretty strong hint
> +             * that this URI doesn't live in the filesystem
> +             */
> +            r_conf->uri_outside_filesystem = 1;        
> +        }
> +            
>      }
>      cache->per_dir_result = r->per_dir_config;
>  


Re: [PATCH] block speed up

Posted by gr...@apache.org.
Geoffrey Young wrote:
>>...which is the same way we enable mod_status and mod_info.  The key
>>thing here is that the URIs to access a Location enabled handler do not
>>map to the filesystem, so the directory walk is a waste of cycles.  So
>>what can we do about it?  
> 
> 
> isn't that what map_to_storage is for?

A map_to_storage hook needs to know when to allow the directory walk and when 
not to.  I've seen some sophisicated modules with multiple hooks do this by 
parsing URIs against their own config.  But since we have several handlers that 
are enabled strictly by core config directives in the same way, it seems 
appropriate to use the existing core config and URI parsing to tell the core 
what to do.   Plus I don't think we would want to duplicate code in each module 
that uses <Location >

Greg


Re: [PATCH] block speed up

Posted by Geoffrey Young <ge...@modperlcookbook.org>.
> ...which is the same way we enable mod_status and mod_info.  The key
> thing here is that the URIs to access a Location enabled handler do not
> map to the filesystem, so the directory walk is a waste of cycles.  So
> what can we do about it?  

isn't that what map_to_storage is for?

http://marc.theaimsgroup.com/?l=apache-httpd-dev&m=106506407907008&w=2

additional issues of interest can be found here

http://marc.theaimsgroup.com/?l=apache-httpd-dev&m=106451559226339&w=2

HTH

--Geoff