You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Allen Pulsifer <pu...@comcast.net> on 2007/06/11 15:01:33 UTC

httpd attempts to open file.html/.htaccess (is this a bug?)

Summary:

When processing a "GET /.../file.html", Apache httpd briefly treats
file.html as a directory and attempts to open
"docroot/.../file.html/.htaccess".  The os returns ENOTDIR, and then
processing of the request continues.

There would seem to be no reason for httpd to attempt to open
file.html/.htaccess, especially since it has already done a stat on
file.html and knows that it is a file, not a directory.

Does anyone else see the same behavior?  Is this a bug?


Configuration: Apache httpd v 2.2.4 running on a default installation of
CentOS-5 (ext3 filesystem).  Tested with stock configuration distributed
with CentOS-5, as well as a stock installation compiled from the source.

Only change to http.conf is:
"AllowOverride None" changed to "AllowOverride All"

DocumentRoot is "/var/www/html", and I created an html file at
/var/www/html/dir/subdir/file.html

Command is "curl -i http://localhost/dir/subdir/file.html"

Result of "strace -f -e trace=file /usr/sbin/httpd":

[pid 24550] stat64("/var/www/html/dir/subdir/file.html",
{st_mode=S_IFREG|0644, st_size=48, ...}) = 0

[pid 24550] open("/var/www/html/.htaccess", O_RDONLY|O_LARGEFILE) = -1
ENOENT (No such file or directory)

[pid 24550] open("/var/www/html/dir/.htaccess", O_RDONLY|O_LARGEFILE) = -1
ENOENT (No such file or directory)

[pid 24550] open("/var/www/html/dir/subdir/.htaccess", O_RDONLY|O_LARGEFILE)
= -1 ENOENT (No such file or directory)

[pid 24550] open("/var/www/html/dir/subdir/file.html/.htaccess",
O_RDONLY|O_LARGEFILE) = -1 ENOTDIR (Not a directory)

[pid 24550] open("/var/www/html/dir/subdir/file.html", O_RDONLY|O_LARGEFILE)
= 18


Note the open("/var/www/html/dir/subdir/file.html/.htaccess"), which
returned ENOENT.


RE: httpd attempts to open file.html/.htaccess (is this a bug?)

Posted by Allen Pulsifer <pu...@comcast.net>.
Hello William,

Thanks for the suggestions.  I have a fix that is pretty simple (and
therefore I hope, unlikely to break anything ;-).  Later today, after I've
compiled and tested it on both Windows and Linux, I'll post it to the list.

Allen


Re: httpd attempts to open file.html/.htaccess (is this a bug?)

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
Allen Pulsifer wrote:
> 
> Hello Paul and Dev List,
> 
> Thanks for the reply.  I checked out the links and did some code tracing
> with the debugger.  As one of the links pointed out, the problem is in the
> block of code attached below from ap_directory_walk() in server/request.c

just a quick note to thank you, Allen, for the most thorough analysis of
the optimizations of dir_walk.  I'm partially to blame (followed by others
who attempted to optimized further :-) and would love to see an optimization
model which is more generic, e.g. not engangled with the specifics of
'I'm for directories' or 'I'm for patterns'...  It's great to have your
reference to help debug and to correct the functioning of dir_walk, and we
hope you'll participate in testing/confirming any proposed fixes.

My thought for the next-step is to divide dir_walk into cache code (was this
opaque pattern hit before?) and into dir/file handling code, with fixes (which
your patch suggests) and perhaps even clearly splitting out the REG v.s. DIR
into some separate phases.

We are open to all suggestions.

RE: httpd attempts to open file.html/.htaccess (is this a bug?)

Posted by Allen Pulsifer <pu...@comcast.net>.
> > When processing a "GET /.../file.html", Apache httpd briefly treats 
> > file.html as a directory and attempts to open 
> > "docroot/.../file.html/.htaccess".  The os returns ENOTDIR, 
> and then 
> > processing of the request continues.

> Yes, this is a somewhat known issue.  Previously it caused 
> issues with earlier versions of reiserfs4: 
> http://issues.apache.org/bugzilla/show_bug.cgi?id=31126
> 
> Rici explains more details here: 
> http://marc.info/?l=apache-httpd-dev&m=109470495819687&w=4


Hello Paul and Dev List,

Thanks for the reply.  I checked out the links and did some code tracing
with the debugger.  As one of the links pointed out, the problem is in the
block of code attached below from ap_directory_walk() in server/request.c

This block of code is contained in the directory walk that looks for sym
links and .htaccess files.  It is executed immediately after appending the
next path segment, which is either a subdirectory or the file name.

The "if" test at the top of the block attempts to optimize by skipping the
statements the follow it.  The comment on the "if" test states:

  * If...we knew r->filename was a file, and
  * if...we have strict (case-sensitive) filenames, or
  *      we know the canonical_filename matches to _this_ name, and
  * if...we have allowed symlinks
  * skip the lstat and dummy up an APR_DIR value for thisinfo.

The first problem with the "if" test is that it doesn't recognize when the
segment is actually the file name itself, and therefore the type is APR_REG
rather than APR_DIR.  This could easily be fixed, but there may be a few
other problems.

First, it should be mentioned that the optimization can be removed and then
httpd will behave correctly: it will not do a spurious access on
file.html/.htaccess.  However, when the optimization is removed, it will
then do a stat on each component in the file path, when it might not need
to.

Let's first look at the lines of code that follows the "if" optimization and
look at the conditions under which they are not necessary.  Before starting
though, let's note that prior to beginning the directory walk,
ap_directory_walk() does a stat on the full file name, using the
APR_FINFO_MIN parameter.

Later, without optimization, it would then do a stat on each component in
the path, as follows:

1. Do a stat on the path component, looking at the link info
(APR_FINFO_LINK) rather than the target info.

2. Test if stat returned an error.  Note that since the initial stat on the
full path did not return an error, the stat on the component will never
return as error (assuming the program logic is correct).  This can therefore
always be optimized out.

3. Fix up the path name if the actual component name info does not match.  A
mismatch is only possible with a file system that is not case sensitive, and
therefore can be optimized out if either (a) the file system is case
sensitive or (b) we already know they match; or (c) we don't care if they
match or not.

4. If the path is a link, run resolve_symlink().  This function will always
return success when OPT_SYM_LINKS (FollowSymLinks) is enabled.

5. If the path points at anything other than a directory, end processing.

So basically, these processing steps can be skipped whenever (1)
FollowSymLinks is enabled AND (2) the file system is case sensitive.

It seems to me that the optimization should actually read:

If (filesystem is case sensitive AND OPT_SYM_LINKS is enabled AND we did a
successful stat on the full file path) Then:

{	If (the path to test is the fill path AND full path points at a
regular file) Then: end processing

	Else: assume path to test is a dir and skip the stat
}

These are the two things I'm concerned about:

1. In the current optimization, the comment says:

  * if...we have strict (case-sensitive) filenames, or
  *      we know the canonical_filename matches to _this_ name, and

while the actual code says:

#ifdef CASE_BLIND_FILESYSTEM
                && (filename_len <= canonical_len)
#endif

At first examination, it looks the comment describes the correct
implementation, but how does the test for "filename_len <= canonical_len"
ensure that "canonical_filename matches to _this_ name".  Can anyone verify
this is correct?

2. When OPT_SYM_LINKS is enabled, resolve_symlink() does not test
OPT_SYM_OWNER, i.e., OPT_SYM_LINKS overrides OPT_SYM_OWNER.  The
optimization however insists that OPT_SYM_LINKS is set while OPT_SYM_OWNER
is unset.

Which of these two are correct?  Should resolve_symlink() always check
OPT_SYM_OWNER, even if OPT_SYM_LINKS is enabled, or should the optimization
only check OPT_SYM_LINKS?

Thanks,

Allen


---------

THE PROBLEMATIC BLOCK OF CODE

            /* First optimization;
             * If...we knew r->filename was a file, and
             * if...we have strict (case-sensitive) filenames, or
             *      we know the canonical_filename matches to _this_ name,
and
             * if...we have allowed symlinks
             * skip the lstat and dummy up an APR_DIR value for thisinfo.
             */
            if (r->finfo.filetype
#ifdef CASE_BLIND_FILESYSTEM
                && (filename_len <= canonical_len)
#endif
                && ((opts.opts & (OPT_SYM_OWNER | OPT_SYM_LINKS)) ==
OPT_SYM_LINKS))
            {

                thisinfo.filetype = APR_DIR;
                ++seg;
                continue;
            }

            /* We choose apr_stat with flag APR_FINFO_LINK here, rather that
             * plain apr_stat, so that we capture this path object rather
than
             * its target.  We will replace the info with our target's info
             * below.  We especially want the name of this 'link' object,
not
             * the name of its target, if we are fixing the filename
             * case/resolving aliases.
             */
            rv = apr_stat(&thisinfo, r->filename,
                          APR_FINFO_MIN | APR_FINFO_NAME | APR_FINFO_LINK,
                          r->pool);

            if (APR_STATUS_IS_ENOENT(rv)) {
                /* Nothing?  That could be nice.  But our directory
                 * walk is done.
                 */
                thisinfo.filetype = APR_NOFILE;
                break;
            }
            else if (APR_STATUS_IS_EACCES(rv)) {
                ap_log_rerror(APLOG_MARK, APLOG_ERR, rv, r,
                              "access to %s denied", r->uri);
                return r->status = HTTP_FORBIDDEN;
            }
            else if ((rv != APR_SUCCESS && rv != APR_INCOMPLETE)
                     || !(thisinfo.valid & APR_FINFO_TYPE)) {
                /* If we hit ENOTDIR, we must have over-optimized, deny
                 * rather than assume not found.
                 */
                ap_log_rerror(APLOG_MARK, APLOG_ERR, rv, r,
                              "access to %s failed", r->uri);
                return r->status = HTTP_FORBIDDEN;
            }

            /* Fix up the path now if we have a name, and they don't agree
             */
            if ((thisinfo.valid & APR_FINFO_NAME)
                && strcmp(seg_name, thisinfo.name)) {
                /* TODO: provide users an option that an internal/external
                 * redirect is required here?  We need to walk the URI and
                 * filename in tandem to properly correlate these.
                 */
                strcpy(seg_name, thisinfo.name);
                filename_len = strlen(r->filename);
            }

            if (thisinfo.filetype == APR_LNK) {
                /* Is this a possibly acceptable symlink?
                 */
                if ((res = resolve_symlink(r->filename, &thisinfo,
                                           opts.opts, r->pool)) != OK) {
                    ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r,
                                  "Symbolic link not allowed "
                                  "or link target not accessible: %s",
                                  r->filename);
                    return r->status = res;
                }
            }

            /* Ok, we are done with the link's info, test the real target
             */
            if (thisinfo.filetype == APR_REG ||
                thisinfo.filetype == APR_NOFILE) {
                /* That was fun, nothing left for us here
                 */
                break;
            }
            else if (thisinfo.filetype != APR_DIR) {
                ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r,
                              "Forbidden: %s doesn't point to "
                              "a file or directory",
                              r->filename);
                return r->status = HTTP_FORBIDDEN;
            }

            ++seg;
        } while (thisinfo.filetype == APR_DIR);
---------


Re: httpd attempts to open file.html/.htaccess (is this a bug?)

Posted by Paul Querna <ch...@force-elite.com>.
Allen Pulsifer wrote:
> Summary:
> 
> When processing a "GET /.../file.html", Apache httpd briefly treats
> file.html as a directory and attempts to open
> "docroot/.../file.html/.htaccess".  The os returns ENOTDIR, and then
> processing of the request continues.

Yes, this is a somewhat known issue.  Previously it caused issues with
earlier versions of reiserfs4:
http://issues.apache.org/bugzilla/show_bug.cgi?id=31126

Rici explains more details here:
http://marc.info/?l=apache-httpd-dev&m=109470495819687&w=4

It would be nice to fix the root issue....

-Paul

Re: httpd attempts to open file.html/.htaccess (is this a bug?)

Posted by Giuliano Gavazzi <de...@humph.com>.
On 11 Jun 2007, at 15:01, Allen Pulsifer wrote:

> When processing a "GET /.../file.html", Apache httpd briefly treats
> file.html as a directory and attempts to open
> "docroot/.../file.html/.htaccess".  The os returns ENOTDIR, and then
> processing of the request continues.
[...]
>
> Does anyone else see the same behavior?  Is this a bug?
>
>
> Configuration: Apache httpd v 2.2.4 running on a default  
> installation of
> CentOS-5 (ext3 filesystem).  Tested with stock configuration  
> distributed
> with CentOS-5, as well as a stock installation compiled from the  
> source.
>
> Only change to http.conf is:
> "AllowOverride None" changed to "AllowOverride All"

same here (2.2.4 on macosx 10.4.9) but AllowOverride None for the  
relevant directory.

Giuliano