You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Alexei Kosut <ak...@nueva.pvt.k12.ca.us> on 1997/06/10 00:35:03 UTC

Thoughts on a 2.0 API

Here are some thoughts I've had on the design of an API for Apache
2.0. Please feel free to ignore it. I'm mainly talking here, fyi,
about the API for handling the request. I haven't thought much about
module loading, configuration handling, that sort of thing, and I
believe others are. What I've been pondering is the best way to handle
the "phases" (I'll use this term throughout; other people use
different words, this is mine) of the request.

Currently, Apache has the following phases:

1. URL->filename translation
2. header parse
3. access check
4. user id check
5. auth check
6. type check
7. fixup phase
8. handler phase
9. logging phase

In earlier discussions, assuming we keep a similar request model in
2.0 (which I think is likely), we've discussed having at least the
following phases:

1. "connection open" phase
2. begin request phase
3. URL->URL translation
4. URL->filename translation
5. filename->filename translation
6. header parse phase
7. access check
8. user id check
9. auth check
10. type check
11. fixup phase
12. handler phase
13. . pre-header phase
14. . post-header (pre-body) phase
15. end request phase
16. logging phase
17. end connection phase
18. "connection closed" phase

And I may be missing some we've discussed. The point is, there are
(will be) twice as many request phases in 2.0 as in 1.x. We've also
had problems with the nature of the current model. Currently, most
phases are run always, with the exception of the handler phase, which
is keyed off of handler/media type. Apache was designed mainly to
serve GET requests, and it doesn't do as well as it could when handing
POSTs, PUTs, and whatever else. Especially with regards to
HTTP/1.1. It would be useful to know what methods a given handler
should handle *before* it's run. And some handlers in some modules
don't handle non-GETs correctly. There are other, similar problems.

Also, the phase functions in the module are static; they're determined
beforehand, and added into the server when the module is loaded. With
a lot of phases, and a lot of permutations of these, it may not be the
best idea to do it this way. For one thing, a module_rec entry with
twenty lines in each module just looks ugly, and adding more phases
just gets ugly. Especially if there are phases not supported by the
API (this wouldn't happen with Apache, but if the API was cloned --
see my previous "Molly and the Apache API" email -- it might. Also,
certain phases might be turned off by the server for performance
reasons?) Along a similar vein, each module having 20+ functions that
have to be checked and called for each request... if you have a lot of
modules, that could quite possibly slow down the server. Especially
for modules that have features that are by default turned off.

What is the solution? How about making the API dynamic? A module_rec
might just have an initializer phase (or two - we've discussed
"run-on-fork" initializers as well as the "run-on-start" we have now),
and a command table (or something). The initialization function would
call things like:

   add_request_phase(&handler_func, HANDLER_PHASE, "text/html", M_GET|M_POST);
   add_request_phase(&type_func, CHECK_TYPE_PHASE, NULL, M_ANY);

These might also be called from command functions (i.e., putting in
the first "AddType" command would cause mod_mime to add a check_type
phase). Of course, these would be stored like per-dir configs are now,
and merged for the request. In fact, add_request_phase might even be
called *during* the request. For example, a connection-open phase that
activates SSL might add a connection-close phase that turned off SSL
(I don't know exactly how SSL works, so this might not be a good
example) -- this way, non-SSL requests wouldn't bother calling the
connection-close phase's function.

This would also allow the server to optimize its request handling;
because the phase functions would be distinct from the modules, it
would know that it didn't have any check access functions (for
example), so it wouldn't bother checking for them. It might also solve
the "which modules comes first?" problem - maybe the add_request_phase
function might include a priority value (each priority would be
defined by the API, of course). Maybe even a run-all/run-one
indication, unlike now, where some phases run all of the functions,
and some run until they hit an OK (i.e, all the run-all functions
would run, then the run-one ones). And the server could also sort the
functions in an order that makes sense (this would apply probably only
to those functions added at server config, not during the request):
putting the wildcard handlers last, so it wouldn't have to run through
the list twice.

It would also allow (as I mentioned earlier), an implementation of the
API to leave stuff out. For example, if the server didn't support the
CHECK_ACCESS phase, it could return -1 to the add_request_phase()
function. The module would check the return value, and if it was
non-zero, would return an error or something. (although I must admit,
exceptions would be nice here). This would also allow us (the Apache
Group) to add additional phases without breaking backwards and
forwards compatibility for existing modules. In fact, I think that
should be a goal for 2.0 in general - we should set up rules about
changing/adding/deleting "public" functions/structures/variables so
that binary compatibility is sacrificed as little as possible. Again,
see my "Molly and the Apache API" for why this is a Good Idea.

Anyway, that's an idea I had. I haven't had much experience with
software engineering on this scale, so there's a good chance that it's
a pathetically stupid idea. If so, could someone please tell me?

-- 
________________________________________________________________________
Alexei Kosut <ak...@nueva.pvt.k12.ca.us>      The Apache HTTP Server
URL: http://www.nueva.pvt.k12.ca.us/~akosut/   http://www.apache.org/

Re: Thoughts on a 2.0 API

Posted by Alexei Kosut <ak...@nueva.pvt.k12.ca.us>.

On Tue, 10 Jun 1997, Rob Hartill wrote:

> So, what I think it's better to throw out the concept of named phases.
> By all means start from the ordered phases we have and propose, but
> don't enforce that order on everyone implicitly or explicitly. I should
> be allowed to juggle any of the core handlers around to achieve my
> objectives without feeling guilty about it.

Hmm. I disagree. There needs to be named phases, becuase a request is
linear, and things happen: connection open, request read, response
sent, connection closed. Within those four general areas, things
happen - again in certain orders. 

So while I think it might be useful to get rid of "named phases" in
the sense of "this phase is for determining the media type of the
document", it is not useful in the sense of "this phase occurs after
the request has been read, but before the response headers have been
sent."

In addition, I'm a bit scared of all this talk about expanded
configuration. While I support it, I don't want to turn Apache's
config into sendmail.cf, where details on how to run the protocol part
of the server are configurable by neccessity. In other words, most
people just modify the first few lines of their sendmail.cf and let it
run. But if you want to do anything special, you have to understand
the rest, and be able to modify it (I know I'm oversimplifying
here). I want the user to continue to be able to add complex
functionality *without* having to know anything about how the request
works, and ordering of API phases, and whatever else.

I think the API may *need* named phases, even though two paragraphs
ago I said they didn't. For example, in order for more than one
authentication module to worth correctly, they all need to follow a
similar approach. Which they do now, in the current API. With Rob's
approach, they might not.

I see two different uses for the API: Firstly, when writing your own
custom server, and you have control over all your modules, and know
exactly how they all work, and how you want to deal with them. The
other approach is the one I've been talking about lately, the
"plug-in" theory, where a server admin can take the server, and add in
modules written by others that perform certain functions, and it all
works seamlessly, without any configuration. In other words, I
download a binary of the server, drop precompiled object files for
PHP, mod_perl and a log file analyzer I bought from a mail order
catalog into a "modules" directory, and the server just does it with
mininal input from me.

The API I proposed works very well (IMHO) in supporting this second
theory of server use, but not as well with the first. Ben and Rob's
ideas probably work better with the first use, but not the second. We
need to work out how to make both work in Apache 2.0.

-- 
________________________________________________________________________
Alexei Kosut <ak...@nueva.pvt.k12.ca.us>      The Apache HTTP Server
URL: http://www.nueva.pvt.k12.ca.us/~akosut/   http://www.apache.org/

Re: Thoughts on a 2.0 API

Posted by Rob Hartill <ro...@imdb.com>.

Some responses, but I admit I've mostly hijaked Alexei's post to
throw in some of my own thoughts and ideas.

On Mon, 9 Jun 1997, Alexei Kosut wrote:

> Here are some thoughts I've had on the design of an API for Apache
> 2.0. Please feel free to ignore it.

> In earlier discussions, assuming we keep a similar request model in
> 2.0 (which I think is likely), we've discussed having at least the
> following phases:
> 
> 1. "connection open" phase
> 2. begin request phase
> 3. URL->URL translation
> 4. URL->filename translation
> 5. filename->filename translation
> 6. header parse phase

It might be confusing terminology, but I think 'header parsing'
belongs before 3.

I think we need to collect as much information about the request
as early as possible so that any translations that take place can use
that information. e.g. if a header specifies a preference for Greek
over English language then the URL->->->filename translation phase should
be able to act on that.

> 7. access check
> 8. user id check
> 9. auth check
> 10. type check
> 11. fixup phase
> 12. handler phase
> 13. . pre-header phase
> 14. . post-header (pre-body) phase
> 15. end request phase
> 16. logging phase
> 17. end connection phase
> 18. "connection closed" phase
> 
> And I may be missing some we've discussed.

Those should be considered some of the basics. I'd also like to be
able to squeeze any handler I come up with in between any of these
listed above, e.g. I might want to add a handler near phase 2 that
just aborts the connection if the incoming IP is a nuisance site.

> The point is, there are
> (will be) twice as many request phases in 2.0 as in 1.x. We've also
> had problems with the nature of the current model. Currently, most
> phases are run always, with the exception of the handler phase, which
> is keyed off of handler/media type.

Assuming we end up with a form of stacked handlers, I think it'd
be useful to allow any handler to switch off other handlers that follow
it for any particular request. To do that a way is needed for handlers
to uniquely identify themselves.

> What is the solution? How about making the API dynamic? A module_rec
> might just have an initializer phase (or two - we've discussed
> "run-on-fork" initializers as well as the "run-on-start" we have now),
> and a command table (or something). The initialization function would
> call things like:
> 
>    add_request_phase(&handler_func, HANDLER_PHASE, "text/html", M_GET|M_POST);
>    add_request_phase(&type_func, CHECK_TYPE_PHASE, NULL, M_ANY);
> 
> These might also be called from command functions (i.e., putting in
> the first "AddType" command would cause mod_mime to add a check_type
> phase).

mod_perl does something similar. You can push a list of functions (handlers)
into any of the current phases. What I've noticed about the way I (ab)use
that system is that categorising the phases breaks down into a free for
all when you decide at exactly which point you want the function called.
... you end up thinking to hell with what the phase is called and
supposed to do, I want my handler *there*.

If I want to call a bit of code very early then I currently stick it in
at the headerparser phase even though it has nothing to do with parsing
headers as such.

So, what I think it's better to throw out the concept of named phases.
By all means start from the ordered phases we have and propose, but
don't enforce that order on everyone implicitly or explicitly. I should
be allowed to juggle any of the core handlers around to achieve my
objectives without feeling guilty about it.

> Of course, these would be stored like per-dir configs are now,
> and merged for the request. In fact, add_request_phase might even be
> called *during* the request. For example, a connection-open phase that
> activates SSL might add a connection-close phase that turned off SSL
> (I don't know exactly how SSL works, so this might not be a good
> example) -- this way, non-SSL requests wouldn't bother calling the
> connection-close phase's function.

I think mod_perl allows handler to push other handler onto a stack
at runtime. That's a good idea and one I'd like to see in the core.
The trick is to allow the handler to be "pushed" (perhaps a bad description
because it implies a 'stack', how about "shoved" :-) into any point of
the request's chain (linked list ?).

> This would also allow the server to optimize its request handling;
> because the phase functions would be distinct from the modules, it
> would know that it didn't have any check access functions (for
> example), so it wouldn't bother checking for them. It might also solve
> the "which modules comes first?" problem - maybe the add_request_phase
> function might include a priority value (each priority would be
> defined by the API, of course). Maybe even a run-all/run-one
> indication, unlike now, where some phases run all of the functions,
> and some run until they hit an OK (i.e, all the run-all functions
> would run, then the run-one ones).

this could be done if handlers are allowed to switch off later
handlers by having flags for different 'groups' of handler. The
flags can be carried in the request record and any group of handlers
should be able to define new flags and other shareable data in the
request record. The idea being to let handlers "communicate" information
among themselves without needing extra variables/code in the core
to organise things. A common set of these flags could be used, say, to
tell ALL subsequent handlers other than output and logging to skip
processing the request. Some handler's could default to skipping
everything unless they see some flag set...

basically,  chain handlers together in any order the user wants,
let them talk to each other to decide who does what and who does nothing
and keep the core out of the way as much as possible.

--
Rob Hartill                              Internet Movie Database (Ltd)
http://www.moviedatabase.com/   .. a site for sore eyes.

Re: Thoughts on a 2.0 API

Posted by Dean Gaudet <dg...@arctic.org>.

On Mon, 9 Jun 1997, Alexei Kosut wrote:
>  Along a similar vein, each module having 20+ functions that
> have to be checked and called for each request... if you have a lot of
> modules, that could quite possibly slow down the server. Especially
> for modules that have features that are by default turned off.

I'm ok with this because I'm assuming that some variant of
http://www.arctic.org/~dgaudet/patches/apache-1.2-run_method-performance.patch
will be used.  This patch optimizes out all the NULLs and the result is
noticeably faster.

>  In fact, add_request_phase might even be
> called *during* the request.

Ooh... this can be both bad and good for performance.  I can even imagine
cases where this would be useful, and I'm sure Ralf can too -- since
mod_rewrite sometimes needs to pick up at a later phase (although
that might be fixed by the additions of the various translation phases?).

In theory it could just add_request_phase its handler at init time
and just check a bit in the request structure to decide to execute itself
or not.  This bit could be moved into the method vector gunk and
run_method could take care of it ... If you want to allow for a PRE and
a POST priority they can be combined with the vector solution above
and per_request linked lists for the PRE and the POST handlers.

i.e.  various limited versions of your general suggestion are easy 
to implement efficiently.

I question that you actually need the complete generality that you're
suggesting... what you should do is consider the phases as you suggested
them, and then consider example problems that we have with the present
API and try to solve them with your new API.  Then consider how dynamic
add_request_phase helps, and what features it needs.

Dean

Re: Thoughts on a 2.0 API

Posted by Paul Sutton <pa...@ukweb.com>.

On Mon, 9 Jun 1997, Alexei Kosut wrote:
> Here are some thoughts I've had on the design of an API for Apache
> 2.0. Please feel free to ignore it. I'm mainly talking here, fyi,

> 1. "connection open" phase
> 2. begin request phase
> 3. URL->URL translation
> 4. URL->filename translation
> 5. filename->filename translation
> 6. header parse phase

I would like to see phases renamed so they do not make any assumptions
that the request will be satisfied by a file on the filesystem. I.e.
instead of using the term "filename", use a generic reference to a
physical location (e.g. "URL->local-identifier translation"). Sure I know
that even the current API could map URLs onto artibrary location
identifiers (as the proxy module does with proxy:) but I would like to the
module API defined from the start as being file-independent.

> What is the solution? How about making the API dynamic? A module_rec
> might just have an initializer phase (or two - we've discussed
> "run-on-fork" initializers as well as the "run-on-start" we have now),
> and a command table (or something). The initialization function would
> call things like:
> 
>    add_request_phase(&handler_func, HANDLER_PHASE, "text/html", M_GET|M_POST);
>    add_request_phase(&type_func, CHECK_TYPE_PHASE, NULL, M_ANY);

Yes, I think this is essential in 2.0. It's something I've wanted for a
long time (I even have a simple proof-of-concept with this module I
knocked up around Apache 1.1 time. I'm not sure about setting the request
method like this though - I'd like to be be expandable to arbitrary new
request methods (rather than a predetermined set of constant identifiers),
and modules may only respond to certain URIs for certain request methods
(so perhaps there ought to be a request-method check phase in the API?). 

//pcs