You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Bill Stoddard <bi...@wstoddard.com> on 2001/04/07 22:55:00 UTC

Are Location directives broken (or why run directory_walk if location_walk finds a handler?)

I think the handling of Location directives, particularly the SetHandler directive inside
a Location directive is broken. Or perhaps it never worked the way I intuit is should
work.  Why should we run directory_walk() if a location_walk() has identified a handler
for the request???

>From my earlier example...

In httpd.conf I have enabled mod_status thusly...
<Location /server-status>
SetHandler server-status
</Location>

A request for http://mymachine/server-status will result in two stats:

stat 1 - %DocumentRoot%/server-status
stat 2 - %DocumentRoot%

Both are in get_path_info() as part of directory_walk().  Why are we even in
directory_walk()???

location_walk() is run BEFORE directory_walk() (see process_request_internal() in
http_request.c).  Seems if we find a handler for the request during the first
location_walk, we should skip the call to directory_walk(), no?  Skipping directory_walk
would avoid the stat calls.

One other observation that suggests an alternate solution... the very beginning of
directory_walk() has this check.......

   /*
     * Are we dealing with a file? If not, we can (hopefuly) safely assume we
     * have a handler that doesn't require one, but for safety's sake, and so
     * we have something find_types() can get something out of, fake one. But
     * don't run through the directory entries.
     */

    if (r->filename == NULL) {
        r->filename = ap_pstrdup(r->pool, r->uri);
        r->finfo.st_mode = 0;   /* Not really a file... */
        r->per_dir_config = per_dir_defaults;

        return OK;
    }

This suggests that perhaps the translate hook for mod_status should identify that the
request is not a file and set r->filename = NULL.  I am looking at Apache 1.3 but all this
discussion should apply to Apache 2.0 as well.

Bill


Re: Are Location directives broken (or why run directory_walk if location_walk finds a handler?)

Posted by Greg Ames <gr...@remulak.net>.
Greg Stein wrote:
> 
> On Sat, Apr 07, 2001 at 08:54:51PM -0500, William A. Rowe, Jr. wrote:
> >...
> > For 2.1 (or 3.0) this nonsense must change.  If I say <Location /> SetHandler 
> >   SQLSpace-handler
> > then the entire file-system part of httpd needs to just _disappear_.
> 
> You got that right. After doing the stuff for mod_dav's arbitrary backends,
> I've been thinking on the right form for Apache for quite a while. Rather
> than respond to the items in your post, I'll briefly explain how I believe
> we should be doing the processing:
> 
>     Apache would keep a tree which corresponds to the URL space. The root of
>     the tree corresponds to "/" on the virtual server. Children of each node
>     are a hash table, keyed by the URI component. Nodes only exist where
>     configuration has occurred -- the node configuration typically means
>     children exist, but they simply aren't part of the URL-space tree.
> 

This has the Ring of Truth to it.  Clearly, not it 1.3; definitely in
2.1.  For 2.0, I'd say it depends how nasty the gory details look.

> 
> Having the tree in memory means that we can quickly map a URL to a resource.

...love it!  It does sound faster, at least from 20,000 feet.

Greg

Re: Are Location directives broken (or why run directory_walk if location_walk finds a handler?)

Posted by Bill Stoddard <bi...@wstoddard.com>.
> On Sat, Apr 07, 2001 at 08:54:51PM -0500, William A. Rowe, Jr. wrote:
> >...
> > For 2.1 (or 3.0) this nonsense must change.  If I say <Location /> SetHandler SQLSpace-handler
> > then the entire file-system part of httpd needs to just _disappear_.
>
> You got that right. After doing the stuff for mod_dav's arbitrary backends,
> I've been thinking on the right form for Apache for quite a while. Rather
> than respond to the items in your post, I'll briefly explain how I believe
> we should be doing the processing:
>

I don't grok this explanation...

>     Apache would keep a tree which corresponds to the URL space.

When is this tree built? At server start or dynamically? Based on what directives? Location? Alias?
Rewrite rules?  A simple first approach (and perhaps the only approach) is to define the URL space
as being -defined- by Location directives.

> The root of
>     the tree corresponds to "/" on the virtual server. Children of each node
>     are a hash table, keyed by the URI component. Nodes only exist where
>     configuration has occurred -- the node configuration typically means
>     children exist, but they simply aren't part of the URL-space tree.
>

Please give a simple example :-)

>     One of the config options at a node is a directory/file path (the Alias
>     directive). Alternatively, the node could point to a database of
>     content, or to a custom module, or whatever. The basic point is that
>     each node says "<this> function/module/handler specifies the URL space
>     for my children (which are not explicitly listed in the node's
>     children)." (note: it only translates URLs into an internal resource
>     identifier; the handler for a resource is still separate)
>
>     Thus, if we see /foo, the system looks at the tree, finds "/" and that
>     it is mapped to DIR in the filesystem. However, we also see "foo" as a
>     child, so we go to that node. That node points to a custom module for
>     that space. If we see "/bar", then we do a directory walk to bar.
>
>     I believe that we'd want to look for a .htaccess in "/" since that *is*
>     the parent resource of /foo, even though /foo is handled special (it may
>     want to establish security policy for everything in "/"). Of course, the
>     config for "/" could disable .htaccess files.
>
>     etc.
>
> Having the tree in memory means that we can quickly map a URL to a resource.
> Currently, we have linear lists of directories, locations, and files that we
> need to scan. The tree is also much nicer from an introspection point -- we
> can navigate it to discover what the URL space looks like. This is actually
> a HUGE bonus for DAV's PROPFIND method. At the moment, I return results for
> a PROPFIND based on what I see in the filesystem (imagine PROPFIND to be an
> "ls" for the web server). If somebody has attached some kind of URL space in
> there, the PROPFIND doesn't return it. However, with the tree, I can see
> those children and return results for them.
>
> The trees would be per-server, and (sub)trees could be shared.
>
> But whatever... this is a big restructuring best left post-2.0.
>
> Cheers,
> -g
>
> --
> Greg Stein, http://www.lyra.org/
>


Re: Are Location directives broken (or why run directory_walk if location_walk finds a handler?)

Posted by Greg Stein <gs...@lyra.org>.
On Sat, Apr 07, 2001 at 08:54:51PM -0500, William A. Rowe, Jr. wrote:
>...
> For 2.1 (or 3.0) this nonsense must change.  If I say <Location /> SetHandler SQLSpace-handler
> then the entire file-system part of httpd needs to just _disappear_.

You got that right. After doing the stuff for mod_dav's arbitrary backends,
I've been thinking on the right form for Apache for quite a while. Rather
than respond to the items in your post, I'll briefly explain how I believe
we should be doing the processing:

    Apache would keep a tree which corresponds to the URL space. The root of
    the tree corresponds to "/" on the virtual server. Children of each node
    are a hash table, keyed by the URI component. Nodes only exist where
    configuration has occurred -- the node configuration typically means
    children exist, but they simply aren't part of the URL-space tree.

    One of the config options at a node is a directory/file path (the Alias
    directive). Alternatively, the node could point to a database of
    content, or to a custom module, or whatever. The basic point is that
    each node says "<this> function/module/handler specifies the URL space
    for my children (which are not explicitly listed in the node's
    children)." (note: it only translates URLs into an internal resource
    identifier; the handler for a resource is still separate)

    Thus, if we see /foo, the system looks at the tree, finds "/" and that
    it is mapped to DIR in the filesystem. However, we also see "foo" as a
    child, so we go to that node. That node points to a custom module for
    that space. If we see "/bar", then we do a directory walk to bar.

    I believe that we'd want to look for a .htaccess in "/" since that *is*
    the parent resource of /foo, even though /foo is handled special (it may
    want to establish security policy for everything in "/"). Of course, the
    config for "/" could disable .htaccess files.
    
    etc.

Having the tree in memory means that we can quickly map a URL to a resource.
Currently, we have linear lists of directories, locations, and files that we
need to scan. The tree is also much nicer from an introspection point -- we
can navigate it to discover what the URL space looks like. This is actually
a HUGE bonus for DAV's PROPFIND method. At the moment, I return results for
a PROPFIND based on what I see in the filesystem (imagine PROPFIND to be an
"ls" for the web server). If somebody has attached some kind of URL space in
there, the PROPFIND doesn't return it. However, with the tree, I can see
those children and return results for them.

The trees would be per-server, and (sub)trees could be shared.

But whatever... this is a big restructuring best left post-2.0.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: Are Location directives broken (or why run directory_walk if location_walk finds a handler?)

Posted by "William A. Rowe, Jr." <ad...@rowe-clan.net>.
From: "dean gaudet" <dg...@arctic.org>
Sent: Sunday, April 08, 2001 12:15 PM


> On Sat, 7 Apr 2001, William A. Rowe, Jr. wrote:
> 
> > And the worst change of all, if we are going to allow this shifting
> > from filesystem to the URI space, we need the location handler to
> > start recongizing that the namespace (filesystem) has mapped a
> > /foo///bar/ location to the /foo/bar/ location, and _STOP_ letting
> > people walk around such escaped locations.
> 
> unfortunately "/foo///bar/" is a different uri from "/foo/bar/", and if
> "/foo" is a mount-point for a handler which takes various options in the
> form of sub-path components then "///bar/" would have 3 components, 2 of
> which are empty.

I concur.  The issue is this ... if /foo maps into filesystem space (and none
of the foo/ foo// foo/// or foo///bar URI's map the user back out of filesystem
space) then the location is tested at each of these (and no-ops), and should
finally resolve to location /foo/bar after the filesystem 'accepts' /foo/bar
as the request to serve.

While these are different URI's - the resource they serve results in a specific
URI.  Let's use the request for /foo/bar where the (case insensitive) /Foo/Bar
exists... I'm sugesting the tests ought to sequence

<Location  /> [Has a DocumentRoot in filesystem space, so...]
<Directory /> 
<Location  /foo> [Isn't overridden]
<Directory /Foo> [Folded to /Foo before testing]
<Location  /foo/> [Isn't overridden]
<Directory /Foo/> [Is the FS entity]
<Location  /foo//> [Isn't overridden]
<Directory /Foo/>   [Corrected to /Foo/ -- we ought to optimize out]
<Location  /foo///> [Isn't overridden]
<Directory /Foo/>   [Corrected to /Foo/ -- we ought to optimize out]
<Location  /foo///bar> [Isn't overridden]
<Directory /Foo/Bar>   [Folded to Bar before testing]

Since /foo/bar is accepted as servable by the filesystem space...

<Location  /Foo/Bar>  [Closing Location test we have today, but as-mapped]

The specifics of what goes on in Directory may change in the context of the
filesystem [unix, samba, win32, os2, netware etc] that we are serving. It
should be pointed out that a fs could _accept_ /foo///bar as semanticly
significant for it's structure.

But in the case that /foo///bar was a location that mapped to server-info,
the final pass would accept:

<Location  /foo///bar>

and would never parse <Directory /Foo/Bar> nor correct 'the location' 
and test <Location /Foo/Bar> since it isn't in filesystem space.

The counter-argument goes that once we decided that /foo -> /Foo, the user
had no more flexibility, and the <Location /foo/// -> /Foo/> since all of 
those elements fell into filesystem space.  I'm not strongly bent either way.

We have a long conversation to go before we choose the 'one correct way'
to handle this.

Bill





Re: Are Location directives broken (or why run directory_walk if location_walk finds a handler?)

Posted by dean gaudet <dg...@arctic.org>.
On Sat, 7 Apr 2001, William A. Rowe, Jr. wrote:

> For 2.1 (or 3.0) this nonsense must change.  If I say <Location />
> SetHandler SQLSpace-handler then the entire file-system part of httpd
> needs to just _disappear_.
>
> This requires a number of non-trivial changes:

btw, there are other file-system specific dependencies in
mod_negotiation... there's really two parts of negotiation.  one is the
discovery of what alternatives there are for a document, and the other is
the calculation of what to respond with given the list of alternatives.

it'd be nice if the second part was object-storage independant... but
that'll require a bit of work to do it efficiently -- it's possibly not
efficient to just have the object-store list all the alternatives.

> And the worst change of all, if we are going to allow this shifting
> from filesystem to the URI space, we need the location handler to
> start recongizing that the namespace (filesystem) has mapped a
> /foo///bar/ location to the /foo/bar/ location, and _STOP_ letting
> people walk around such escaped locations.

unfortunately "/foo///bar/" is a different uri from "/foo/bar/", and if
"/foo" is a mount-point for a handler which takes various options in the
form of sub-path components then "///bar/" would have 3 components, 2 of
which are empty.

-dean


Re: Are Location directives broken (or why run directory_walk if location_walk finds a handler?)

Posted by "William A. Rowe, Jr." <ad...@rowe-clan.net>.
From: "Bill Stoddard" <bi...@wstoddard.com>
Sent: Saturday, April 07, 2001 3:55 PM


> I think the handling of Location directives, particularly the SetHandler directive inside
> a Location directive is broken. Or perhaps it never worked the way I intuit is should
> work.  Why should we run directory_walk() if a location_walk() has identified a handler
> for the request???
> 
> In httpd.conf I have enabled mod_status thusly...
> <Location /server-status>
> SetHandler server-status
> </Location>
> 
> Why are we even in directory_walk()???
> 
> location_walk() is run BEFORE directory_walk() (see process_request_internal() in
> http_request.c).  Seems if we find a handler for the request during the first
> location_walk, we should skip the call to directory_walk(), no?  Skipping directory_walk
> would avoid the stat calls.

>From the docs http://httpd.apache.org/docs/mod/core.html#location

"<Location> sections are processed in the order they appear in the configuration file, 
after the <Directory> sections and .htaccess files are read, and after the <Files> sections."

and from http://httpd.apache.org/docs/sections.html

"Another note: There is actually a <Location>/<LocationMatch> sequence performed just before 
the name translation phase (where Aliases and DocumentRoots are used to map URLs to filenames).
The results of this sequence are completely thrown away after the translation has completed."

> This suggests that perhaps the translate hook for mod_status should identify that the
> request is not a file and set r->filename = NULL.  I am looking at Apache 1.3 but all this
> discussion should apply to Apache 2.0 as well.

I'm --1 on changing the parsing in 1.3, ever, not even a little.  You will _break_ a ton of
security that was carefully audited in user's environments.

For 2.1 (or 3.0) this nonsense must change.  If I say <Location /> SetHandler SQLSpace-handler
then the entire file-system part of httpd needs to just _disappear_.

This requires a number of non-trivial changes:

  . Location parsing must occur before Directory parsing

  . Directory walking/parsing must leave the core, and move out to a filesystem module

  . When a Location directive contains a DocumentRoot it passes control out of <location >
    into the filesystem <directory > namespace.

  . For legacy, a DocumentRoot outside of a <Location > block is parsed as if it were
    given in the <Location /> block.

Fortunately, AllowOverrides live only in <Directory > blocks, and will continue to behave in 
the correct manner.  The stat for %DocumentRoot%/.htaccess before the /server-status/ location
is tested is always correct if the root of the namespace is a filesystem space.  But it's
entirely non-extensible.  I believe (strongly) that we need an additional set of changes:

  . An Overrides hook that handles the parsing for .htaccess files.  This is abstracted away
    from it's current residence in dir_walk().  It allows the overriding to be done by new
    Overrides handlers sitting in, say, a DBM rather than in .htaccess files.

  . Configuration of the 'Overrides' provider.

  . Implicit assumption (for compatibility) that a filesystem space has the .htaccess 
    Overrides handler if the DocumentRoot was used to map a location (even /) into the
    filesystem space.

  . No invocation of the Overrides hook when AllowOverrides is set to None.

And the worst change of all, if we are going to allow this shifting from filesystem to the
URI space, we need the location handler to start recongizing that the namespace (filesystem)
has mapped a /foo///bar/ location to the /foo/bar/ location, and _STOP_ letting people walk
around such escaped locations.


None of these changes is simple, and there are no one-liner patches to solve it.  It's
a terribly thorny set of problems.

The coming change to parsing will at least assure that the %DocumentRoot% isn't stated again
once it's canonicalized at startup.  Then AllowOverride None should start working much more
consistently with what you expect, as soon as a directory namespace cache is put into the
server.

Bill

  . 


Re: Are Location directives broken (or why run directory_walk if location_walk finds a handler?)

Posted by dean gaudet <dg...@arctic.org>.
On Sat, 7 Apr 2001, Bill Stoddard wrote:

> Why should we run directory_walk() if a location_walk() has identified a handler
> for the request???

only reason i can think of is because the "/" url has always been
considered to be in the filesystem, and so you have to run directory_walk
to get any access control/auth which has been set on or above the docroot.

i'm not saying this is a good design.

the best design i can think of is to virtualise the entire urlspace and
use "mount points" to mount subtree handlers.

oh and get rid of all the wildcard and regex crap.  it's a waste of cycles
:)

-dean