You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Dean Gaudet <dg...@arctic.org> on 1997/07/01 03:10:11 UTC

directory_walk and some config changes

I want to look a bit at <Directory> and <Location>.  They're both stored
in lists which order them in the order they appear within a virtual host
(or within the main server).  And then at the server merge time, the
main server's list is tacked onto the end of each virtual server's list.
At request time, each list is scanned in order, and each match is merged
into the per_dir config for the request.

In PR#717 Lars Eilebrecht showed that for at least <Location> this ordering
is messed up.  Suppose the main server has:

    <Location />
        blah blah
    </Location>

There is no way for a virtual host to override that because this section
will match *after* all of the virtual host's sections.

The same will happen with <Directory>, but the effects are usually less
severe because of how <Directory> is handled (more later).

This is bogus, and I think that the main server's sections should be
ordered before the virtual host's.  This allows virtual host settings
to override main server settings (which is how we typically implement
everything else).

Now, on to <Directory>.  A brief recap of how directory_walk works,
excuse the syntax:

    per_dir_config = server->lookup_defaults;
    test_filename = "";
    for each component C in r->filename
        test_filename = test_filename . C;

        for each <Directory> section S
            if test_filename matches S then
                merge( per_dir_config, S );
            endif
        endfor

        if overrides allowed then
            test for .htaccess, parse it and merge it if it exists
        endif
    endfor

You'll observe first of all that if you have N <Directory> sections and
on average your file paths have M components then this is an O(N*M) loop,
kinda yucky.

I want to optimize this so that we sort the <Directory> sections by the
number of components in their string.  We preserve order of course, so
that if:

    <Directory /a/b>
        ... foo1
    </Directory>

    <Directory /a/b>
        ... foo2
    </Directory>

appears in the config then they'll remain in that order once sorted into
the "two components" list.

With that optimization the loop is O(N+M).

There are two complications:  *? wildcarding, and regular expressions.

*? wildcarding is easy to deal with if we decree that they behave like
don't cross component boundaries (they currently do, see the comment
right before strcmp_match).  This would match filesystem/shell semantics
more closely.

Regexs on the other hand I would like to process *after* doing the
component by component matches above.  In effect this makes <Directory ~ foo>
very similar to <Files ~ foo>.  One reason I want to do this is that
currently, there's this oddball behaviour:

<Directory ~ abc>
    ... whatever
</Directory>

Then consider a request to /abc/def/foo.html.  That directory section
will be matched and merged (at least) twice.  i.e.

/                       no match
/docroot                no match
/docroot/abc            match, merge
/docroot/abc/def        match, merge

Anyone here use Directory regexs?  How would this impact you?

I don't want to start coding this until I'm sure people are happy with
the change.  It's changes in some of the more subtle aspects of our
config, stuff people probably don't use a lot.

Dean


Re: directory_walk and some config changes

Posted by Rasmus Lerdorf <ra...@lerdorf.on.ca>.
> I thought about it and it makes sense.  I'd love it if somehow we could
> create a tree model in memory in which configuration info was kept, so
> per-dir stuff could be even more efficient by walking a tree rather than
> comparing an entire list...

That would certainly make mod_info a trivial piece of code, instead of the
current jumble we have right now.

-Rasmus


Re: directory_walk and some config changes

Posted by Brian Behlendorf <br...@organic.com>.
I thought about it and it makes sense.  I'd love it if somehow we could
create a tree model in memory in which configuration info was kept, so
per-dir stuff could be even more efficient by walking a tree rather than
comparing an entire list...

	Brian


At 02:45 PM 7/7/97 -0700, you wrote:
>I didn't get any negative comments on this, so I'm going to proceed with
>the semantic changes below.  The regex semantics don't have to be changed
>as radically as I indicate below, but I'll start off that way and see how
>it goes. 


--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
"Why not?" - TL           brian@organic.com - hyperreal.org - apache.org

Re: directory_walk and some config changes

Posted by Dean Gaudet <dg...@arctic.org>.
I didn't get any negative comments on this, so I'm going to proceed with
the semantic changes below.  The regex semantics don't have to be changed
as radically as I indicate below, but I'll start off that way and see how
it goes. 

Dean

On Mon, 30 Jun 1997, Dean Gaudet wrote:

> I want to look a bit at <Directory> and <Location>.  They're both stored
> in lists which order them in the order they appear within a virtual host
> (or within the main server).  And then at the server merge time, the
> main server's list is tacked onto the end of each virtual server's list.
> At request time, each list is scanned in order, and each match is merged
> into the per_dir config for the request.
> 
> In PR#717 Lars Eilebrecht showed that for at least <Location> this ordering
> is messed up.  Suppose the main server has:
> 
>     <Location />
>         blah blah
>     </Location>
> 
> There is no way for a virtual host to override that because this section
> will match *after* all of the virtual host's sections.
> 
> The same will happen with <Directory>, but the effects are usually less
> severe because of how <Directory> is handled (more later).
> 
> This is bogus, and I think that the main server's sections should be
> ordered before the virtual host's.  This allows virtual host settings
> to override main server settings (which is how we typically implement
> everything else).
> 
> Now, on to <Directory>.  A brief recap of how directory_walk works,
> excuse the syntax:
> 
>     per_dir_config = server->lookup_defaults;
>     test_filename = "";
>     for each component C in r->filename
>         test_filename = test_filename . C;
> 
>         for each <Directory> section S
>             if test_filename matches S then
>                 merge( per_dir_config, S );
>             endif
>         endfor
> 
>         if overrides allowed then
>             test for .htaccess, parse it and merge it if it exists
>         endif
>     endfor
> 
> You'll observe first of all that if you have N <Directory> sections and
> on average your file paths have M components then this is an O(N*M) loop,
> kinda yucky.
> 
> I want to optimize this so that we sort the <Directory> sections by the
> number of components in their string.  We preserve order of course, so
> that if:
> 
>     <Directory /a/b>
>         ... foo1
>     </Directory>
> 
>     <Directory /a/b>
>         ... foo2
>     </Directory>
> 
> appears in the config then they'll remain in that order once sorted into
> the "two components" list.
> 
> With that optimization the loop is O(N+M).
> 
> There are two complications:  *? wildcarding, and regular expressions.
> 
> *? wildcarding is easy to deal with if we decree that they behave like
> don't cross component boundaries (they currently do, see the comment
> right before strcmp_match).  This would match filesystem/shell semantics
> more closely.
> 
> Regexs on the other hand I would like to process *after* doing the
> component by component matches above.  In effect this makes <Directory ~ foo>
> very similar to <Files ~ foo>.  One reason I want to do this is that
> currently, there's this oddball behaviour:
> 
> <Directory ~ abc>
>     ... whatever
> </Directory>
> 
> Then consider a request to /abc/def/foo.html.  That directory section
> will be matched and merged (at least) twice.  i.e.
> 
> /                       no match
> /docroot                no match
> /docroot/abc            match, merge
> /docroot/abc/def        match, merge
> 
> Anyone here use Directory regexs?  How would this impact you?
> 
> I don't want to start coding this until I'm sure people are happy with
> the change.  It's changes in some of the more subtle aspects of our
> config, stuff people probably don't use a lot.
> 
> Dean
> 
>