You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Greg Stein <gs...@lyra.org> on 2000/04/26 12:27:49 UTC

config hook design

I believe there are three types of hooks into the configuration process
that we can define. These are used for different kinds of capabilities,
have different assumptions (on the part of the code doing the hook), and
operate at different points in the process.

1) Hook the reading of a directive + args from an Apache-style config file

   - this hook will not be available for other types of config files, as
     it is specific to the current format and the configfile_t input
   - provides maximum control over reading/parsing the input file
   - the earliest hook available
   - allows "foreign" syntax in the Apache file (e.g. Perl source)

2) Hook the insertion of a node into the config tree

   - independent of the input mechanism
   - the file is not available
   - provides for "immediate" execution of directives
   - useful for AddModule/LoadModule

3) Hook between tree construction and tree processing

   - allows modules to jigger the tree at will, before processing
   - macro systems, tree rewriting, etc
   - each hook would walk the tree itself (rather than Apache doing a walk
     and calling hooks; we can't do it given the kinds of changes the
     called function may want to make)

Already present:
4) Mapping directives to "commands"  (the final tree walk)


The reason for separation between (1) and (2) is pretty simple: there are
certain assumptions about the config input when we say "read a block of
text". That implies we are reading from some kind of file, rather than
(say) an SNMP or LDAP configuration database. To continue to allow the
operation of mod_perl, the (1) hook is required.

(2) is required to properly hook things like AddModule and LoadModule
(rather than gumming up the process with special cases). This is also one
of the easiest places to hook (it doesn't involve tree walks). Note: we
shouldn't really be encouraging hooks at his level -- running commands
immediately (rather than in (4)) is usually not required.

(3) is the best place for macro systems which need to examine and process
the whole tree. Theoretically, these hooks should probably have some kind
of ordering (eval macros before running mod_tree_mangler).


The actual APIs for these things should be pretty straight-forward, so I'm
going to detail them here. The ideas are the part in question :-)


Note this solves the problem that I raised about mod_perl reading text
blocks. Since LoadModule is invoked during construction, then mod_perl
gets loaded and can install (1)-type hooks. Those can read the text block
and drop it into ap_directive_t.data.


Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


Re: config hook design

Posted by rb...@covalent.net.
> > is no better than the configuration modules.  BTW, two tools because one
> > has to convert to a file format that people can hack, and the second has
> > to convert back.
> 
> Huh? Why is there ever a need to convert back. Humans shouldn't be
> hacking on both sides.

Yep, humans work on one side, but the server works on the other.  The
steps to edit a config file are:

convert from dir structure to single file
hack config file
convert back to allow Apache to understand config.

> 
> > Well, we have the idea of inheritance in our config files, but this system
> > means that you need a directory even if you aren't changing anything,
> > because you might change something in a sub-directory.
> 
> I can't parse this.

Take this example.  I have DocRoot, DocRoot/foo, and DocRoot/foo/bar.  In
the 1.3 config file I have:

<Directory DocRoot>
    AllowOverride ALL
</Directory>

<Directory DocRoot/foo/bar>
    AllowOverride NONE
</Directory>

The allowoverride is on for DocRoot and DocRoot/foo, but off for
DocRoot/foo/bar.

In a directory structure, you would need three Directories, one for each,
but the foo directory would be basically empty, because it inherits
everything from DocRoot.  This just opens up another place for people to
forget to look.

> But don't hang the rest of my argument for out-of-process
> configuration on this one proposal. 

I'm just dealing with a qmail style config file.  Others, who need the
in-process config option, are discussing that.  I think they are doing a
fine job explaining why having the ability to do config parsing in-process
is useful.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------


Re: config hook design

Posted by Manoj Kasichainula <ma...@io.com>.
On Thu, Apr 27, 2000 at 06:59:16AM -0400, rbb@covalent.net wrote:
> On Wed, 26 Apr 2000, Manoj Kasichainula wrote:
> 
> > On Wed, Apr 26, 2000 at 05:22:36PM -0400, rbb@covalent.net wrote:
> > > I thoroughly dislike the qmail approach for Apache.  It just doesn't fit
> > > well with what Apache does IMHO.
> > 
> > In what way? Remember, I don't expect mere mortals to hack on this
> > tree; a preparser will be bundled with Apache for now.
> 
> Pardon my saying this, but that's idiotic.  :-)  What we have then, is a
> config system that "mere mortals" can't use without two extra tools.

OK, I was being a bit flippant there. I think that such a structure is
reasonable for normal people to use once they've actually read the
documentation on how it works; for example, this is similar to how the
Linux kernel can be configured at runtime.

But, some people prefer hacking on a single, commented file. We would
still bundle a preparser to break down the config and run checks that
we want done before an Apache server thinks about restarting, whether
a multi-file or single-file Apache config is used.

> This
> is no better than the configuration modules.  BTW, two tools because one
> has to convert to a file format that people can hack, and the second has
> to convert back.

Huh? Why is there ever a need to convert back. Humans shouldn't be
hacking on both sides.

> Well, we have the idea of inheritance in our config files, but this system
> means that you need a directory even if you aren't changing anything,
> because you might change something in a sub-directory.

I can't parse this.

> In general, I just dislike this design for something like a web
> server.  It doesn't buy us anything, IMO, and it hurts quite a bit.

I'm quite open to the possibility of a single-file config system too;
I've proposed the qmail-style configuration because it's extremely
easy for outside scripts to hack on.

> This
> is not easier to admin, if anything I think this is a harder design to
> admin.

I think this is a personal preference thing. Talk to a ReiserFS guy
sometime. :) As I've used this kind of setup more and more, I've been
growing to really like it (though qmail itself still annoys me). I've
also been worried about it because I don't think opening dozens of
files on startup is going to be very fast.

But don't hang the rest of my argument for out-of-process
configuration on this one proposal. 


Re: config hook design

Posted by Greg Stein <gs...@lyra.org>.
On Wed, 26 Apr 2000, Manoj Kasichainula wrote:
> On Wed, Apr 26, 2000 at 05:22:36PM -0400, rbb@covalent.net wrote:
>...
> > It also breaks all of the tools that are
> > currently used for Apache configuration.
> 
> 1.3-compatibility pre-parser, which could even be the default 2.0
> preparser. Besides, the fact that 2.0 is different shouldn't surprise
> anyone.

There is no excuse for arbitrarily breaking tools. Patching over the
problem by throwing more stuff at the problem you caused is not truly
solving it. You're just hiding it.

The latest Netcraft came out. Apache is still going up. Those 8 million
users aren't going to be happy if we capriciously start making their life
difficult.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


Re: config hook design

Posted by rb...@covalent.net.
On Wed, 26 Apr 2000, Manoj Kasichainula wrote:

> On Wed, Apr 26, 2000 at 05:22:36PM -0400, rbb@covalent.net wrote:
> > I thoroughly dislike the qmail approach for Apache.  It just doesn't fit
> > well with what Apache does IMHO.
> 
> In what way? Remember, I don't expect mere mortals to hack on this
> tree; a preparser will be bundled with Apache for now.

Pardon my saying this, but that's idiotic.  :-)  What we have then, is a
config system that "mere mortals" can't use without two extra tools.  This
is no better than the configuration modules.  BTW, two tools because one
has to convert to a file format that people can hack, and the second has
to convert back.

Now, where does this not fit in with Apache?

Well, we have the idea of inheritance in our config files, but this system
means that you need a directory even if you aren't changing anything,
because you might change something in a sub-directory.

In general, I just dislike this design for something like a web
server.  It doesn't buy us anything, IMO, and it hurts quite a bit.  This
is not easier to admin, if anything I think this is a harder design to
admin.  This design makes sense for some servers, but not for Apache.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------


Re: config hook design

Posted by Manoj Kasichainula <ma...@io.com>.
On Wed, Apr 26, 2000 at 05:22:36PM -0400, rbb@covalent.net wrote:
> I thoroughly dislike the qmail approach for Apache.  It just doesn't fit
> well with what Apache does IMHO.

In what way? Remember, I don't expect mere mortals to hack on this
tree; a preparser will be bundled with Apache for now.

The only disadvantage that I can think of is that it could be slow to
read. Iff that's the case, I have no objection to a pure serialized
form of this, i.e. a 100% pure XML config file. All of my arguments
apply just as well in that case.

> It also breaks all of the tools that are
> currently used for Apache configuration.

1.3-compatibility pre-parser, which could even be the default 2.0
preparser. Besides, the fact that 2.0 is different shouldn't surprise
anyone.


Re: config hook design

Posted by Greg Stein <gs...@lyra.org>.
On Wed, 26 Apr 2000, Manoj Kasichainula wrote:
> On Wed, Apr 26, 2000 at 03:27:49AM -0700, Greg Stein wrote:
>...
> > 2) Hook the insertion of a node into the config tree
> > 
> >    - independent of the input mechanism
> >    - the file is not available
> >    - provides for "immediate" execution of directives
> >    - useful for AddModule/LoadModule
> 
> My opinion is still solidifying on all this. I'm in random rambling
> stage.
> 
> Are there cases where this is useful besides these two directives? I

Logging.

I'm still trying to get a handle on how logging configuration SHOULD be
done. Once that has gel'd, then we'll see where/how it fits. At the
moment, I believe the logging will also make use of the insertion-time
hook.

> think it gets confusing to have ordering skewed with this kind of
> hook, so I'd rather avoid it. Sounds like you would too.

No skew. Just different execution times. Ryan continues to describe this
kind of stuff as separate passes over the tree. I fold construction and a
followup pass into one process (since this produces the same, net effect).

> I'm actually leaning towards more of a configuration language where
> ordering is irrelevant.

This can only occur to a point. There are at least three things that I can
think of to monkey this:

1) user/group and the corresponding setuid/setgid. this may get worse
   if/when we start forking children that have distinct uid/gid pairs.

2) module loading

3) log configuration and opening


> ISTR that this isn't the case in the current
> config structure, but I'm not too clueful on what's in the code right
> now.

The tree is ordered according to the input file. It preserves the
*existing* semantics of Apache's ordered configuration file.

Remember: we're taking steps. At this point, it was very important not to
alter Apache's semantic model of configuration. That can/will come later
once the kinks are out of this step (witness the problem in building the
tree).

> Now, the question is then how to allow LoadModule/AddModule
> constructs. If we didn't have to tell the user when there were unknown
> directives present, this would be easy enough, but unfortunately, I
> guess we have to prevent users from shooting themselves in the foot.

Absolutely.

> Hmmmm, though that could be the job of our config file
> preprocessor...

Bleck. You're talking about developing whole new tool suites just to avoid
a simple, optional hook in the code (for *others* to take advantage of;
not us).

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


Re: config hook design

Posted by rb...@covalent.net.
> > 1)  Read the config.  As we read the file, we check the directive to
> > determine if it is known.  If it is known, we do what the command tells us
> > (either parse it to directive/args or use raw text), and store a pointer
> > to the function that handles this command (no sense looking it up multiple
> > times).  If the directive is unknown, we store the directive as raw text,
> > if it is a container read the whole container as one node in raw text.
> 
> *) most directives will be unknown unless you have some way to load
>    modules *DURING* this process. therefore, you are saying that most
>    directives would be unprocessed.

Unfortunately, yes.  But, without processing directives, the read/build
step would be fast, so this is really a wash.

> *) we should not store unknown containers as raw text. the contents could
>    very well be additional configuration directives. by storing as raw
>    text, you require a second pass over the text to parse out the
>    directives. this increases the complexity of the process as you
>    parse/walk/parse-again/insert-into-tree/walk-child-tree ...

The problem is you have no way of knowing how a container wants it's
information stored.  I agree this causes an annoying re-walk step, but how
else do we keep <perl> sections un-parsed?  Without forcing mod_perl to go
back to the file?

> *) I agree with recognizing commands at construction time. This is
>    practically a requirement for performing validation -- effective
>    validation requires the parser to know when it has just read an invalid
>    command. It has the context at that point. Waiting for a separate walk
>    of the tree will (typically) lose the context of where the erroneous
>    directive came from. We hack this now by storing file/line in the
>    directive structure (which is bogus the long run).
> 
> > 2)  First pass through config.  Grab all LoadModule commands and load all
> > modules
> 
> This can be folded into (1) with no effective change in end semantics. A
> LoadModule executed during build is effectively the same as a LoadModule
> executed in a separate pass. You will still end up with a sequence of
> loaded modules, loading in a particular order.
>
> Even better: loading *during* the construction allows more directives to
> be recognized.
> 

This can't be folded into step 1.  Take the example of a LoadModule within
a <ifMod> container>  Now, not only are you executing the LoadModule
directive while reading the file, you are also executing any branching
directives while reading the file.  You're adding complexity to something
that can easily be made simple.

Yes, I agree that three passes through the tree sucks, I would love to
minimize this, but I would rather write the code, and make it work before
we start optimizing.  The more alpha's we have with a broken config
because we can't decide the "right" way to do this, the slower people will
be to adopt 2.0.

> > 4)  Second pass through config.  Any directive that wasn't caught before,
> > catch it now.  This means we find the directive, parse it to
> > directive/args or leave it as raw text.  Store a pointer to the directive
> > handler.  This would also be the stage where potentially we could have
> > modules hook the config phase, so mod_macro or the include directive could
> > munge the tree here.
> 
> mod_macro occurs above.

Yep, I'm an idiot.  :-)

> IMO, your argument processing occurred during your step (1). In the long
> run, we're going to do it there for proper validation. May as well
> design/specify it that way.

IMO, it doesn't work, unless you start loading modules when you encounter
a LoadModule command, which as I stated above, is a slippery slope,
because once you start executing one directive at load time, you also have
to execute others.

> > 5)  Third pass, actually walk the config and configure the server.
> 
> My step (4), which is really just current behavior.

Yep.

> > I thoroughly dislike the qmail approach for Apache.  It just doesn't fit
> > well with what Apache does IMHO.  It also breaks all of the tools that are
> > currently used for Apache configuration.
> 
> Agreed.
> 
> Cheers,
> -g

Essentiall greg, we are saying the same thing, we just differ on where and
when the optimizations should be done.  Let's code it without
optimizations, so that module writers can use it, and optimize it when we
are done.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------


Re: config hook design

Posted by Greg Stein <gs...@lyra.org>.
On Wed, 26 Apr 2000 rbb@covalent.net wrote:
>...
> 1)  Read the config.  As we read the file, we check the directive to
> determine if it is known.  If it is known, we do what the command tells us
> (either parse it to directive/args or use raw text), and store a pointer
> to the function that handles this command (no sense looking it up multiple
> times).  If the directive is unknown, we store the directive as raw text,
> if it is a container read the whole container as one node in raw text.

*) most directives will be unknown unless you have some way to load
   modules *DURING* this process. therefore, you are saying that most
   directives would be unprocessed.

*) we should not store unknown containers as raw text. the contents could
   very well be additional configuration directives. by storing as raw
   text, you require a second pass over the text to parse out the
   directives. this increases the complexity of the process as you
   parse/walk/parse-again/insert-into-tree/walk-child-tree ...

*) I agree with recognizing commands at construction time. This is
   practically a requirement for performing validation -- effective
   validation requires the parser to know when it has just read an invalid
   command. It has the context at that point. Waiting for a separate walk
   of the tree will (typically) lose the context of where the erroneous
   directive came from. We hack this now by storing file/line in the
   directive structure (which is bogus the long run).

> 2)  First pass through config.  Grab all LoadModule commands and load all
> modules

This can be folded into (1) with no effective change in end semantics. A
LoadModule executed during build is effectively the same as a LoadModule
executed in a separate pass. You will still end up with a sequence of
loaded modules, loading in a particular order.

Even better: loading *during* the construction allows more directives to
be recognized.

> 3)  Run pre_config hook for all modules (MPM and standard)

This is my hook type (3).

> 4)  Second pass through config.  Any directive that wasn't caught before,
> catch it now.  This means we find the directive, parse it to
> directive/args or leave it as raw text.  Store a pointer to the directive
> handler.  This would also be the stage where potentially we could have
> modules hook the config phase, so mod_macro or the include directive could
> munge the tree here.

mod_macro occurs above.

IMO, your argument processing occurred during your step (1). In the long
run, we're going to do it there for proper validation. May as well
design/specify it that way.

> 5)  Third pass, actually walk the config and configure the server.

My step (4), which is really just current behavior.

> This is very very close to what others have suggested, so it is not my
> idea, it is just the one I like the best.  This also allows mod_perl to
> add perl into the config file without hurting us or making the config
> ugly.

Not quite sure about this; comments were above.

> I thoroughly dislike the qmail approach for Apache.  It just doesn't fit
> well with what Apache does IMHO.  It also breaks all of the tools that are
> currently used for Apache configuration.

Agreed.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/



Re: config hook design

Posted by rb...@covalent.net.
I've been thinking about this all day, and reading the ramblings of other
people, here is what I would like to see.  I'll be implementing this over
the next day or two, and posting a patch so that others can comment.  :-)

1)  Read the config.  As we read the file, we check the directive to
determine if it is known.  If it is known, we do what the command tells us
(either parse it to directive/args or use raw text), and store a pointer
to the function that handles this command (no sense looking it up multiple
times).  If the directive is unknown, we store the directive as raw text,
if it is a container read the whole container as one node in raw text.

2)  First pass through config.  Grab all LoadModule commands and load all
modules

3)  Run pre_config hook for all modules (MPM and standard)

4)  Second pass through config.  Any directive that wasn't caught before,
catch it now.  This means we find the directive, parse it to
directive/args or leave it as raw text.  Store a pointer to the directive
handler.  This would also be the stage where potentially we could have
modules hook the config phase, so mod_macro or the include directive could
munge the tree here.

5)  Third pass, actually walk the config and configure the server.

This is very very close to what others have suggested, so it is not my
idea, it is just the one I like the best.  This also allows mod_perl to
add perl into the config file without hurting us or making the config
ugly.

I thoroughly dislike the qmail approach for Apache.  It just doesn't fit
well with what Apache does IMHO.  It also breaks all of the tools that are
currently used for Apache configuration.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------


Re: config hook design

Posted by Manoj Kasichainula <ma...@io.com>.
On Wed, Apr 26, 2000 at 03:27:49AM -0700, Greg Stein wrote:
> 1) Hook the reading of a directive + args from an Apache-style config file
> 3) Hook between tree construction and tree processing

You've seen my alternative suggestions to these.

> 2) Hook the insertion of a node into the config tree
> 
>    - independent of the input mechanism
>    - the file is not available
>    - provides for "immediate" execution of directives
>    - useful for AddModule/LoadModule

My opinion is still solidifying on all this. I'm in random rambling
stage.

Are there cases where this is useful besides these two directives? I
think it gets confusing to have ordering skewed with this kind of
hook, so I'd rather avoid it. Sounds like you would too.

I'm actually leaning towards more of a configuration language where
ordering is irrelevant. ISTR that this isn't the case in the current
config structure, but I'm not too clueful on what's in the code right
now.

Now, the question is then how to allow LoadModule/AddModule
constructs. If we didn't have to tell the user when there were unknown
directives present, this would be easy enough, but unfortunately, I
guess we have to prevent users from shooting themselves in the foot.

Hmmmm, though that could be the job of our config file
preprocessor...