You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by "William A. Rowe, Jr." <wr...@rowe-clan.net> on 2001/12/12 20:27:07 UTC

Considering the Default Handler and Subrequests

All of the changes I'd like to introduce below impact one another, and
are based on several basic premises;


  1. A sub_req_lookup() caller (of any flavor) should be able to make
     heads or tails of the request based on the processing that occurs
     within ap_process_request_internal().

  2. No handler should die on known 'stuff', e.g. headers, etc, when those
     decisions can be made upfront, during the ap_process_request_internal()
     cycle.  A sub_req call can then choose some alternate resource (such as 
     going on to the next entry in the list of DirectoryIndex documents).  
     Dying for reasons we, couldn't determine up-front, such as mis-submitted 
     POST data, remains always acceptable.

  3. Namespace polution is evil.  Ergo, any module should reject requests
     for a document with path_info if it doesn't address the name space
     passed in path_info for the request.

This leads to several individual patches that I'm writing.

. My mod_dir patch which performs the entire internal handling of dirs within
  the fixup hook.  It has an associated patch to restore autoindexes' correct
  behavior {posted 12/2, Message-ID: <02...@v505>}

. Patch in HOOK_LAST of ap_run_fixups; punt the *handler to core if the thing is
  a file, and the proper path_info conditions exist, and punt no-file to NOTFOUND 
  and no-handler cases to 500 misconfigured.

. Hooks are _not_ the fastest things in the world, especially with the strcmps
  around ->handler going on.  If we resolve the ->handler up front, why not
  provide a ->handler_fn member that skips the entire handler() hook walk?

. Add DefaultHandler [AcceptPost|RejectPost] [AcceptPathInfo|RejectPathInfo]
  The default to both is, Reject.  This setting is noted, but not acted upon, 
  until we hit the HOOK_LAST of fixups.

. php and others may then toggle the setting for Post or PathInfo in their setup
  hooks, so they can superceed the user .conf setting or the defaults.

. move ap_run_insert_filter to the run_subreq code, so we get that possibly
  expensive hook into the content.  Since it can't fail, it doesn't need to sit
  up in the lookup logic.

Comments?

Bill


Re: Considering the Default Handler and Subrequests

Posted by "William A. Rowe, Jr." <wr...@covalent.net>.
From: "Ryan Bloom" <rb...@covalent.net>
Sent: Thursday, December 13, 2001 4:00 PM


> If you want to make this work for the most modules possible, create a single
> model that all modules must implement, and make that perform the best
> that it can.

That's exactly what I'm saying ... 1.3 had two flavors, by-name and by-pattern,
and threw them all into one heap.  Do we agree we can ignore the by-pattern
cruft, add a handler_name arg to an ap_register_handler() fn, and invoke by
exact match one specific handler?

The other case (by-pattern) should have been treated seperately in the first
place.  Rather than waiting for the handler phase, the module either gets
invoked by an explicit AddHandler/SetHandler directive, or the very sophisticated
module can play games in the fixups phase, changing the handler string to itself
when it is 'interested' in taking the request.

Is this an agreeable pattern to everyone?

Bill


Re: Considering the Default Handler and Subrequests

Posted by Ryan Bloom <rb...@covalent.net>.
On Thursday 13 December 2001 01:44 pm, Greg Ames wrote:
> Ryan Bloom wrote:
> > I still think the handler_fn function is overkill.  The performance of
> > Apache 1.3 wasn't bad, because we did sane string compares, making sure
> > that the lengths were equal before doing a full strcmp.
>
> You're right, 1.3 was faster than the way we do it now.  But we could do
> better still.
>
> I was thinking of something like a hash or a trie search for exact
> matches on r->handler.  But then OtherBill got me thinking that if a
> module decided that it could serve the request in some earlier phase, it
> ought to be able to do something to latch on to the handler phase and
> eliminate most of the searching altogether.

The problem with this, as I just explained to Bill on the phone, is that you 
are providing two mechanisms to do the same thing.  You are trying to solve
a problem in the core by making it harder to write a performant module.  The
harder it is to write a performant module, the fewer people who will do so.

If you want to make this work for the most modules possible, create a single
model that all modules must implement, and make that perform the best
that it can.

Ryan

______________________________________________________________
Ryan Bloom				rbb@apache.org
Covalent Technologies			rbb@covalent.net
--------------------------------------------------------------

Re: Considering the Default Handler and Subrequests

Posted by "William A. Rowe, Jr." <wr...@covalent.net>.
From: "Greg Ames" <gr...@remulak.net>
Sent: Thursday, December 13, 2001 3:44 PM


> Ryan Bloom wrote:
> 
> > I still think the handler_fn function is overkill.  The performance of Apache 1.3
> > wasn't bad, because we did sane string compares, making sure that the
> > lengths were equal before doing a full strcmp.
> 
> You're right, 1.3 was faster than the way we do it now.  But we could do
> better still.  
> 
> I was thinking of something like a hash or a trie search for exact
> matches on r->handler.  But then OtherBill got me thinking that if a
> module decided that it could serve the request in some earlier phase, it
> ought to be able to do something to latch on to the handler phase and
> eliminate most of the searching altogether.  

If we do implement this with a hash, to a static handler name (drop this
whole idea of pattern matches), then I think the one solution makes the
most sense.  Leave it a string for the common case, and those few that
are setting themselves up based on wildcards can simply set the r->handler
string.

Bill


Re: Considering the Default Handler and Subrequests

Posted by Greg Ames <gr...@remulak.net>.
Ryan Bloom wrote:

> 
> I still think the handler_fn function is overkill.  The performance of Apache 1.3
> wasn't bad, because we did sane string compares, making sure that the
> lengths were equal before doing a full strcmp.

You're right, 1.3 was faster than the way we do it now.  But we could do
better still.  

I was thinking of something like a hash or a trie search for exact
matches on r->handler.  But then OtherBill got me thinking that if a
module decided that it could serve the request in some earlier phase, it
ought to be able to do something to latch on to the handler phase and
eliminate most of the searching altogether.  

Greg

Re: Considering the Default Handler and Subrequests

Posted by Ryan Bloom <rb...@covalent.net>.
On Thursday 13 December 2001 01:07 pm, William A. Rowe, Jr. wrote:
> Blending Ryan's and Greg's observations with my own...
>
> It probably makes more sense to register the supported handlers (one
> call per handler/name) with a handler name to follow the 1.3 convention.
> For all simple cases, this is probably best.
>
> Only modules with interesting characteristics (not foo-bar names, but
> rather */* matches, such as a .gif file processing engine or something
> like that) would want to try grabbing handler_fn along the request.

I still think the handler_fn function is overkill.  The performance of Apache 1.3
wasn't bad, because we did sane string compares, making sure that the
lengths were equal before doing a full strcmp.

Using the handler_fn is going to make writing a module VERY complex, because
we will have multiple ways to solve the same problem.  Either we use a
handler_fn, or we use a handler name, not both.

Ryan

______________________________________________________________
Ryan Bloom				rbb@apache.org
Covalent Technologies			rbb@covalent.net
--------------------------------------------------------------

Re: Considering the Default Handler and Subrequests

Posted by "William A. Rowe, Jr." <wr...@covalent.net>.
Blending Ryan's and Greg's observations with my own...

It probably makes more sense to register the supported handlers (one
call per handler/name) with a handler name to follow the 1.3 convention.
For all simple cases, this is probably best.

Only modules with interesting characteristics (not foo-bar names, but 
rather */* matches, such as a .gif file processing engine or something
like that) would want to try grabbing handler_fn along the request.

Bill


Re: Considering the Default Handler and Subrequests

Posted by "William A. Rowe, Jr." <wr...@covalent.net>.
From: "Greg Ames" <gr...@remulak.net>
Sent: Thursday, December 13, 2001 1:55 PM


> "William A. Rowe, Jr." wrote:
> 
> > Second, autoindex should be a generator [handler].  mod_dir should _NOT_.  I spelled
> > out the reasons for that on 2001.12.02.  
> 
> OK, peace.  But since we were talking about how to drive handlers more
> efficiently, we need to think about the harder cases.  One of them will
> be how to deal with multiple handlers who will accept the same
> r->handler value.  

:)  We have a number of cases like this.  I'd believe that the simplest way
to handle them is to address the handler first in the translation phase (if
the request maps to a non-filesystem uri) or then in the type checker or fixup
based on the content (which is what the old system did 90% of the time anyways.)

Let's look at the full list of Apache handlers;

default_handler

  The fallback case, assigned in fixups if nobody else gloms onto a filesystem 
  request (where r->finfo.filetype != NULL).  Or it could be the product of
  handler_fn != NULL, and left at that.

file_cache_handler, handle_map_file

  Are identified in the translate_names phase (file found) and could be overridden
  with SetHandler/AddHandler in the type-checker phase.  Not really a generator,
  but these are alternates to the default (plain file) cases.

isapi_handler, cgi_handler, cgid_handler, asis_handler, imap_handler

  These can be claimed in the type-checker phase (SetHandler/AddHandler or
  by-content-type.)  All of these are truly handlers because they do very
  unique things with otherwise plain files.

display_info, status_handler 

  Are identified by the location_walk phase (SetHandler or error condition) 
  and really are generators.

ssl_hook_Handler

  A generator I suspect doesn't need to be, we should be allowed to error out
  earlier.  If this is not possible, it's easily identified very early on.
  [Just an "It's Broke, don't send plain requests on an SSL port!" response.]

autoindex_handler

  Can be claimed in the fixups phase if nobody else claims it, it is really
  a generator.

dir_handler, redirect_handler

  We shouldn't have to get this far, we should be able to return the redirect back
  in the fixups phase, as I've proposed for mod_dir.  This allows us to print some
  pretty href in the autoindex listing that would save us a roundtrip from the
  client, potentially.

proxy_handler

  Which is a real handler, and is easily identified up front, at least by the
  fixups phase even on internal proxy redirection.

action_handler

  A really tricky little beast, no?  If you want one to debate, here it is :)


So there's the recap.  I assert that we can decide exactly which handler aught
to take a request no later than the fixups phase.  Comments?

Bill



Re: Considering the Default Handler and Subrequests

Posted by Greg Ames <gr...@remulak.net>.
"William A. Rowe, Jr." wrote:

> > >  If we resolve the ->handler up front, why not
> > >   provide a ->handler_fn member that skips the entire handler() hook walk?
> >
> > The implementation would be interesting.  Consider mod_dir and
> > mod_autoindex.  Both can deal with DIR_MAGIC_TYPE, and both could be
> > present or absent.  But when both are present, the handler topological
> > sort rules must be respected so that handle_dir runs first.  How do you
> > propose dealing with that?  And what about the few handlers that support
> > fuzzy matches?
> 
> Couple of bits.  If we declare handler_fn identically to a hook_handler callback,
> then we can maintain the semantics.  A handler could raise it's hand, but later
> DECLINE.  If it DECLINEs, or handler_fn is NULL, then we proceed with the usual
> walk.
> 
> Second, autoindex should be a generator [handler].  mod_dir should _NOT_.  I spelled
> out the reasons for that on 2001.12.02.  

OK, peace.  But since we were talking about how to drive handlers more
efficiently, we need to think about the harder cases.  One of them will
be how to deal with multiple handlers who will accept the same
r->handler value.  

Greg

Re: Considering the Default Handler and Subrequests

Posted by "William A. Rowe, Jr." <wr...@covalent.net>.
From: "Ryan Bloom" <rb...@covalent.net>
Sent: Thursday, December 13, 2001 1:22 PM


> On Thursday 13 December 2001 11:05 am, William A. Rowe, Jr. wrote:
> >
> > Couple of bits.  If we declare handler_fn identically to a hook_handler
> > callback, then we can maintain the semantics.  A handler could raise it's
> > hand, but later DECLINE.  If it DECLINEs, or handler_fn is NULL, then we
> > proceed with the usual walk.
> 
> Why don't we go back to the original model, where a handler function is 
> associated with a handler-name, and the core just calls the correct one?

And reintroduce more strcasecmps and fnmatches :-?  Let's drop handler and
leave nothing but handler_fn if you like.  Very deterministic.  It's bold,
it's brash, it's hip :)

Seriously, when on earth was it strictly associated with a modules name?
both mod_mmap_static and core registered for "*/*", and core got it if mmap
didn't have it (DECLINED.)  It's always been a hook.

Bill



Re: Considering the Default Handler and Subrequests

Posted by Ryan Bloom <rb...@covalent.net>.
On Thursday 13 December 2001 11:05 am, William A. Rowe, Jr. wrote:
> From: "Greg Ames" <gr...@remulak.net>
> Sent: Thursday, December 13, 2001 12:49 PM
>
> > "William A. Rowe, Jr." wrote:
> > > . Hooks are _not_ the fastest things in the world, especially with the
> > > strcmps around ->handler going on.
> >
> > Amen!  This one has been bugging me for a long time.  It won't show up
> > clearly in a profiler, because the CPU cycles are spread over all the
> > handlers.  We are polluting the instruction cache by touching a lot of
> > separate chunks of code that only return DECLINED.
> >
> > >  If we resolve the ->handler up front, why not
> > >   provide a ->handler_fn member that skips the entire handler() hook
> > > walk?
> >
> > The implementation would be interesting.  Consider mod_dir and
> > mod_autoindex.  Both can deal with DIR_MAGIC_TYPE, and both could be
> > present or absent.  But when both are present, the handler topological
> > sort rules must be respected so that handle_dir runs first.  How do you
> > propose dealing with that?  And what about the few handlers that support
> > fuzzy matches?
>
> Couple of bits.  If we declare handler_fn identically to a hook_handler
> callback, then we can maintain the semantics.  A handler could raise it's
> hand, but later DECLINE.  If it DECLINEs, or handler_fn is NULL, then we
> proceed with the usual walk.

Why don't we go back to the original model, where a handler function is 
associated with a handler-name, and the core just calls the correct one?

Ryan

>
> Second, autoindex should be a generator [handler].  mod_dir should _NOT_. 
> I spelled out the reasons for that on 2001.12.02.  It is part of the wide
> bogosity of dir requests 'slipping through' into autoindex.  mod_dir could
> only be considered a handler if it returns an external redirect, any other
> response it provides in an internal fixup that should happen in fixups.  If
> you try my mod_dir/autoindex patches of late 2001.12.02 [when I got around
> to fixing the implications of mod_dir as a fixup in mod_autoindex] you
> should discover it runs fine.  Had changed mod_dir it to use our
> ap_fast_internal_redirect, so it will uncover more bugs than mod_negotation
> does alone.
>
> Bill

-- 

______________________________________________________________
Ryan Bloom				rbb@apache.org
Covalent Technologies			rbb@covalent.net
--------------------------------------------------------------

Re: Considering the Default Handler and Subrequests

Posted by "William A. Rowe, Jr." <wr...@covalent.net>.
From: "Greg Ames" <gr...@remulak.net>
Sent: Thursday, December 13, 2001 12:49 PM


> "William A. Rowe, Jr." wrote:
> 
> > . Hooks are _not_ the fastest things in the world, especially with the strcmps
> >   around ->handler going on.  
> 
> Amen!  This one has been bugging me for a long time.  It won't show up
> clearly in a profiler, because the CPU cycles are spread over all the
> handlers.  We are polluting the instruction cache by touching a lot of
> separate chunks of code that only return DECLINED.
> 
> >  If we resolve the ->handler up front, why not
> >   provide a ->handler_fn member that skips the entire handler() hook walk?
> 
> The implementation would be interesting.  Consider mod_dir and
> mod_autoindex.  Both can deal with DIR_MAGIC_TYPE, and both could be
> present or absent.  But when both are present, the handler topological
> sort rules must be respected so that handle_dir runs first.  How do you
> propose dealing with that?  And what about the few handlers that support
> fuzzy matches?

Couple of bits.  If we declare handler_fn identically to a hook_handler callback,
then we can maintain the semantics.  A handler could raise it's hand, but later
DECLINE.  If it DECLINEs, or handler_fn is NULL, then we proceed with the usual
walk.

Second, autoindex should be a generator [handler].  mod_dir should _NOT_.  I spelled
out the reasons for that on 2001.12.02.  It is part of the wide bogosity of dir
requests 'slipping through' into autoindex.  mod_dir could only be considered a
handler if it returns an external redirect, any other response it provides in an
internal fixup that should happen in fixups.  If you try my mod_dir/autoindex patches
of late 2001.12.02 [when I got around to fixing the implications of mod_dir as a fixup
in mod_autoindex] you should discover it runs fine.  Had changed mod_dir it to use our
ap_fast_internal_redirect, so it will uncover more bugs than mod_negotation does alone.

Bill



Re: Considering the Default Handler and Subrequests

Posted by Greg Ames <gr...@remulak.net>.
"William A. Rowe, Jr." wrote:

> . Hooks are _not_ the fastest things in the world, especially with the strcmps
>   around ->handler going on.  

Amen!  This one has been bugging me for a long time.  It won't show up
clearly in a profiler, because the CPU cycles are spread over all the
handlers.  We are polluting the instruction cache by touching a lot of
separate chunks of code that only return DECLINED.

>  If we resolve the ->handler up front, why not
>   provide a ->handler_fn member that skips the entire handler() hook walk?

The implementation would be interesting.  Consider mod_dir and
mod_autoindex.  Both can deal with DIR_MAGIC_TYPE, and both could be
present or absent.  But when both are present, the handler topological
sort rules must be respected so that handle_dir runs first.  How do you
propose dealing with that?  And what about the few handlers that support
fuzzy matches?

Greg

Re: Considering the Default Handler and Subrequests

Posted by "William A. Rowe, Jr." <wr...@covalent.net>.
From: "Rodent of Unusual Size" <Ke...@Golux.Com>
Sent: Wednesday, December 12, 2001 2:34 PM


> "William A. Rowe, Jr." wrote:
> > 
> > But if you never use path_info, it should be rejected to
> > prevent exactly this sort of polution/infinite recursion.
> 
> Ah, so you're suggesting that anything that actually maps
> a URI to a document, but which doesn't do anything with
> path-info, should return 404 if the request includes some?

Ack.


Re: Considering the Default Handler and Subrequests

Posted by Rodent of Unusual Size <Ke...@Golux.Com>.
"William A. Rowe, Jr." wrote:
> 
> But if you never use path_info, it should be rejected to
> prevent exactly this sort of polution/infinite recursion.

Ah, so you're suggesting that anything that actually maps
a URI to a document, but which doesn't do anything with
path-info, should return 404 if the request includes some?
-- 
#ken	P-)}

Ken Coar, Sanagendamgagwedweinini  http://Golux.Com/coar/
Author, developer, opinionist      http://Apache-Server.Com/

"All right everyone!  Step away from the glowing hamburger!"

Re: Considering the Default Handler and Subrequests

Posted by "William A. Rowe, Jr." <wr...@covalent.net>.
From: "Rodent of Unusual Size" <Ke...@Golux.Com>
Sent: Wednesday, December 12, 2001 2:07 PM


> "William A. Rowe, Jr." wrote:
> > 
> > 3. Namespace polution is evil.  Ergo, any module should reject requests
> >    for a document with path_info if it doesn't address the name space
> >    passed in path_info for the request.
> 
> Can you explain this in different words?  And/or maybe an example?

Create a document /somepath that accepts path_info but does nothing with it.

/somepath
/somepath/0
/somepath/1
...
/somepath/9
/somepath/a
/somepath/b
...
/somepath/z
...
/somepath/0/0
/somepath/0/1
... ad infinitum

ALL MAP to /somepath.  As far as any indexing engine is concerned, there are
an infinite discrete number of pages.  This is what makes path_info mappings 
such a problem.

In viewcvs.cgi, we have a discrete number of accepted path_info mappings, all
of which point at specific files of a repository.  There are a number of good
path_info uses in mapping a file-based 'program' to a virtual space, such as
the contents of an archive file, a repository, or so forth.

But if you never use path_info, it should be rejected to prevent exactly this
sort of polution/infinite recursion.

Bill



Re: Considering the Default Handler and Subrequests

Posted by Rodent of Unusual Size <Ke...@Golux.Com>.
"William A. Rowe, Jr." wrote:
> 
> 3. Namespace polution is evil.  Ergo, any module should reject requests
>    for a document with path_info if it doesn't address the name space
>    passed in path_info for the request.

Can you explain this in different words?  And/or maybe an example?
-- 
#ken	P-)}

Ken Coar, Sanagendamgagwedweinini  http://Golux.Com/coar/
Author, developer, opinionist      http://Apache-Server.Com/

"All right everyone!  Step away from the glowing hamburger!"