You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Allen Pulsifer <pu...@comcast.net> on 2007/06/13 14:09:09 UTC

patch for spurious open attempt on ".../file.html/.htaccess"

server/request.c lines 930 to 940 currently read:

            if (r->finfo.filetype
#ifdef CASE_BLIND_FILESYSTEM
                && (filename_len <= canonical_len)
#endif
                && ((opts.opts & (OPT_SYM_OWNER | OPT_SYM_LINKS)) ==
OPT_SYM_LINKS))
            {

                thisinfo.filetype = APR_DIR;
                ++seg;
                continue;
            }


Here is the corrected code that fixes the spurious open attempt on the file
".../file.html/.htaccess":

            if (r->finfo.filetype
#ifdef CASE_BLIND_FILESYSTEM
                && (filename_len <= canonical_len)
#endif
                && (opts.opts & OPT_SYM_LINKS) )
            {
                ++seg;

                if (r->finfo.filetype == APR_REG
                    && (!r->path_info || !*r->path_info) )
                {
	              thisinfo.filetype = APR_REG;
                    break;
                }

                thisinfo.filetype = APR_DIR;
                continue;
            }


A note on what this fix and the optimization accomplishes:

In the best case (under Linux with FollowSymLinks and AllowOverride's
enabled), this code makes the following accesses to disk:

1. An initial stat on the file to serve
2. An open attempt on the .htaccess in each directory along the path to the
file
3. An open on the file to serve

As far as I can see, the only way this could be optimized further would be
to eliminate the initial stat on the file to serve, or to turn it into a
file open, and then hold the file open until it is served.

Allen


RE: patch for spurious open attempt on ".../file.html/.htaccess"

Posted by Allen Pulsifer <pu...@comcast.net>.
Here's a proper diff, without the spacing issue.  (It was actually a line
terminator issue, caused by editing the file under Windows.)

--- request.c-2.2.4     2007-06-14 06:47:44.850623600 -0400
+++ request.c   2007-06-13 21:34:01.659559100 -0400
@@ -931,12 +931,20 @@
 #ifdef CASE_BLIND_FILESYSTEM
                 && (filename_len <= canonical_len)
 #endif
-                && ((opts.opts & (OPT_SYM_OWNER | OPT_SYM_LINKS)) ==
OPT_SYM_LINKS))
+                && (opts.opts & OPT_SYM_LINKS) )
             {
-
-                thisinfo.filetype = APR_DIR;
-                ++seg;
-                continue;
+                if ((r->path_info && *r->path_info) || r->finfo.filetype ==
APR_DIR)
+                {
+                    thisinfo.filetype = APR_DIR;
+                    ++seg;
+                    continue;
+                }
+                else if (r->finfo.filetype == APR_REG)
+                {
+                    thisinfo.filetype = APR_REG;
+                    ++seg;
+                    break;
+                }
             }

             /* We choose apr_stat with flag APR_FINFO_LINK here, rather
that

RE: patch for spurious open attempt on ".../file.html/.htaccess"

Posted by Allen Pulsifer <pu...@comcast.net>.
> The attached patch is the final one and applies to 2.2.4?

Yes, that is the final patch.  It is a diff from v.2.2.4.

The indentation might not be quite correct after applying that patch.  Due
space-to-tab conversion issues with my editor, I had to do a "diff -u -b",
and the patch is not correctly showing the increased indentation of at least
one short (3 line) section of code.


Re: patch for spurious open attempt on ".../file.html/.htaccess"

Posted by Oden Eriksson <oe...@mandriva.com>.
onsdag 13 juni 2007 skrev Allen Pulsifer:

The attached patch is the final one and applies to 2.2.4?

-- 
Regards // Oden Eriksson

revised(3) patch for spurious open attempt on ".../file.html/.htaccess"

Posted by Allen Pulsifer <pu...@comcast.net>.
I hate to have to do this...

I realized that ++seg should only be incremented when the optimization
results in either a "continue" or a "break".  Here is the correction:

CODE:

            if (r->finfo.filetype
#ifdef CASE_BLIND_FILESYSTEM
                && (filename_len <= canonical_len)
#endif
                && (opts.opts & OPT_SYM_LINKS) )
            {
                if ((r->path_info && *r->path_info) || r->finfo.filetype ==
APR_DIR)
                {
                    thisinfo.filetype = APR_DIR;
                    ++seg;
                    continue;
                }
                else if (r->finfo.filetype == APR_REG)
                {
                    thisinfo.filetype = APR_REG;
                    ++seg;
                    break;
                }
            }

PATCH:

--- .svn/text-base/request.c.svn-base   2007-06-04 23:40:23.801630400 -0400
+++ request.c   2007-06-13 21:34:01.659559100 -0400
@@ -931,13 +931,21 @@
 #ifdef CASE_BLIND_FILESYSTEM
                 && (filename_len <= canonical_len)
 #endif
-                && ((opts.opts & (OPT_SYM_OWNER | OPT_SYM_LINKS)) ==
OPT_SYM_LINKS))
+                && (opts.opts & OPT_SYM_LINKS) )
+            {
+                if ((r->path_info && *r->path_info) || r->finfo.filetype ==
APR_DIR)
             {
-
                 thisinfo.filetype = APR_DIR;
                 ++seg;
                 continue;
             }
+                else if (r->finfo.filetype == APR_REG)
+                {
+                    thisinfo.filetype = APR_REG;
+                    ++seg;
+                    break;
+                }
+            }

             /* We choose apr_stat with flag APR_FINFO_LINK here, rather
that
              * plain apr_stat, so that we capture this path object rather
than


ANALYSIS:

The optimization formerly skipped the lstat and went directly to the
htaccess check, even when it reached the point of checking the full path and
the full path referred to a file.  This resulted in a spurious open attempt
on .../file.html/.htaccess

The revised optimization also skips the lstat and goes directly to the
htaccess check, but only when (a) the directory walk has not yet reached the
full path; or (b) the full path refers to a directory.

Otherwise, when it reaches the full path, if the full path refers to a
regular file, it terminates the directory walk.  If the file type of the
full path is not a directory or a regular file, it continues on with the
unoptimized lstat of the full path.

revised patch for spurious open attempt on ".../file.html/.htaccess"

Posted by Allen Pulsifer <pu...@comcast.net>.
After sending that last post, I realized that if the full path points to a
directory, it will do a final lstat even though it doesn't need to.

That's easy to fix.  Here is the revised code, the revised patch, and test
output.

CODE:

            if (r->finfo.filetype
#ifdef CASE_BLIND_FILESYSTEM
                && (filename_len <= canonical_len)
#endif
                && (opts.opts & OPT_SYM_LINKS) )
            {
                ++seg;

                if (r->finfo.filetype == APR_DIR || (r->path_info &&
*r->path_info))
                {
                    thisinfo.filetype = APR_DIR;
                    continue;
                }
                else if (r->finfo.filetype == APR_REG)
                {
                    thisinfo.filetype = APR_REG;
                    break;
                }
            }

PATCH:

--- .svn/text-base/request.c.svn-base   2007-06-04 23:40:23.801630400 -0400
+++ request.c   2007-06-13 19:54:37.050034200 -0400
@@ -931,13 +931,21 @@
 #ifdef CASE_BLIND_FILESYSTEM
                 && (filename_len <= canonical_len)
 #endif
-                && ((opts.opts & (OPT_SYM_OWNER | OPT_SYM_LINKS)) ==
OPT_SYM_LINKS))
+                && (opts.opts & OPT_SYM_LINKS) )
             {
+                ++seg;

+                if (r->finfo.filetype == APR_DIR || (r->path_info &&
*r->path_info))
+                {
                 thisinfo.filetype = APR_DIR;
-                ++seg;
                 continue;
             }
+                else if (r->finfo.filetype == APR_REG)
+                {
+                    thisinfo.filetype = APR_REG;
+                    break;
+                }
+            }

             /* We choose apr_stat with flag APR_FINFO_LINK here, rather
that
              * plain apr_stat, so that we capture this path object rather
than


TEST OUTPUT:

request: http://localhost/dir/subdir/ (with an index.html file)

[pid 14461] stat64("/usr/local/apache2/htdocs/dir/subdir/",
{st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
[pid 14461] open("/usr/local/apache2/htdocs/.htaccess",
O_RDONLY|O_LARGEFILE) = 12
[pid 14461] open("/usr/local/apache2/htdocs/dir/.htaccess",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
[pid 14461] open("/usr/local/apache2/htdocs/dir/subdir/.htaccess",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
[pid 14461] stat64("/usr/local/apache2/htdocs/dir/subdir/index.html",
{st_mode=S_IFREG|0755, st_size=51, ...}) = 0
[pid 14461] open("/usr/local/apache2/htdocs/dir/subdir/index.html",
O_RDONLY|O_LARGEFILE) = 12


request: http://localhost/dir/subdir/file.html

[pid 14462] stat64("/usr/local/apache2/htdocs/dir/subdir/file.html",
{st_mode=S_IFREG|0644, st_size=48, ...}) = 0
[pid 14462] open("/usr/local/apache2/htdocs/.htaccess",
O_RDONLY|O_LARGEFILE) = 12
[pid 14462] open("/usr/local/apache2/htdocs/dir/.htaccess",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
[pid 14462] open("/usr/local/apache2/htdocs/dir/subdir/.htaccess",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
[pid 14462] open("/usr/local/apache2/htdocs/dir/subdir/file.html",
O_RDONLY|O_LARGEFILE) = 12

RE: patch for spurious open attempt on ".../file.html/.htaccess"

Posted by Allen Pulsifer <pu...@comcast.net>.
> >                 if (r->finfo.filetype == APR_REG
> >                     && (!r->path_info || !*r->path_info) )

> Possible stupid question, but why do we need this path_info check?

ap_directory_walk uses r->path_info to hold the portion of the path that has
not yet been walked.  The check on r->path_info tells us if we are at "the
end of the line", i.e., if there are no more path segments to append and
check.  Note that the loop is a little warped: it checks htaccess, appends
the next segment, lstat's the new appended path, then loops back to check
htaccess on the appended path, etc.  So at the time we are doing the
optimization, we are working with the path that has just had the next
segment appended and has not yet been checked for anything.

This inner if statement says "if the full path points to a regular file and
if we have no more path segments to check after this one, then break the
loop."  It is in part the same test that is done at line 887--the comment
there reads "Time for all good things to come to an end?".

Having explained the code, I realize that the test is not exactly what it
should be.  It does not handle the case of being at "the end of the line"
with an unknown file type.

To handle this case, the code should read as follows:

            if (r->finfo.filetype
#ifdef CASE_BLIND_FILESYSTEM
                && (filename_len <= canonical_len)
#endif
                && (opts.opts & OPT_SYM_LINKS) )
            {
                ++seg;

                if (r->path_info && *r->path_info)
                {
                    thisinfo.filetype = APR_DIR;
                    continue;
                }
                else if (r->finfo.filetype == APR_REG)
                {
	              thisinfo.filetype = APR_REG;
                    break;
                }
            }

The inner if's now say:

1. If we have more path segments to check after this one, treat this segment
as a directory and continue with the path walk.

2. Else, we have no more path segment to check after this one, therefore:

(a) If the full path we are checking is a regular file, we are done with all
checks and can break from the loop

(b) else, treat the current path as unknown and continue on with the lstat.

It should be noted that this optimization is not completely optimal with a
filesystem that is not case sensitive like Windows.  With
CASE_BLIND_FILESYSTEM, the computation of canonical_len compares
r->canonical_filename to r->path_info, then deducts the filename
("file.html"). As a result, canonical_len will always be less than the full
path length and the test (filename_len <= canonical_len) will be false once
the file name itself ("file.html") is appended to r->filename.  The net
result is that the final lstat will always be performed a second time on the
full path length.  This could be fixed, but since I'm not sure exactly how
all the canonical_filename stuff is used, I'm not going to touch it.

The Windows code has more problems than that though, since it actually walks
and stats each component in the path twice (see file I/O log included
below.)  It does it a first time in
ap_process_request_internal->ap_run_translate_name->ap_core_translate->apr_f
ilepath_merge (win32 version) which includes an apr_stat on each subdir in
path until it finds a file.  Then it stats each component in the path a
second time in ap_directory_walk.  The unix version of apr_filepath_merge
does not contain an apr_stat so this is not a problem.

> Could you please provide it as a unified diff against trunk?

Attached and included inline, a diff -u against v2.2.4.  It should apply
fine to the trunk since the relevant code has not changed.

--- .svn/text-base/request.c.svn-base   2007-06-04 23:40:23.801630400 -0400
+++ request.c   2007-06-13 18:25:27.104542400 -0400
@@ -931,13 +931,21 @@
 #ifdef CASE_BLIND_FILESYSTEM
                 && (filename_len <= canonical_len)
 #endif
-                && ((opts.opts & (OPT_SYM_OWNER | OPT_SYM_LINKS)) ==
OPT_SYM_LINKS))
+                && (opts.opts & OPT_SYM_LINKS) )
             {
+                ++seg;

+                if (r->path_info && *r->path_info)
+                {
                 thisinfo.filetype = APR_DIR;
-                ++seg;
                 continue;
             }
+                else if (r->finfo.filetype == APR_REG)
+                {
+                    thisinfo.filetype = APR_REG;
+                    break;
+                }
+            }

             /* We choose apr_stat with flag APR_FINFO_LINK here, rather
that
              * plain apr_stat, so that we capture this path object rather
than


------------------

Effect of patch

Configuration:

docroot: /usr/local/apache/htdocs
AllowOverrides and FollowSymLinks are enabled

request: http://localhost/dir/subdir/file.html
This file exists, and there is an .htaccess in the docroot


Linux file I/O:

[pid 10755] stat64("/usr/local/apache2/htdocs/dir/subdir/file.html",
{st_mode=S_IFREG|0644, st_size=48, ...}) = 0
[pid 10755] open("/usr/local/apache2/htdocs/.htaccess",
O_RDONLY|O_LARGEFILE) = 12
[pid 10755] open("/usr/local/apache2/htdocs/dir/.htaccess",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
[pid 10755] open("/usr/local/apache2/htdocs/dir/subdir/.htaccess",
O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
[pid 10755] open("/usr/local/apache2/htdocs/dir/subdir/file.html",
O_RDONLY|O_LARGEFILE) = 12


Windows file I/O, with annotations:

The following file I/O is from
ap_process_request_internal
 ap_run_translate_name
  ap_core_translate
   apr_filepath_merge (win32 version)

OPEN	C:\usr\local\apache2\htdocs\	SUCCESS	Options: Open Directory
Access: 00100001
DIRECTORY	C:\usr\local\apache2\htdocs\	SUCCESS
FileBothDirectoryInformation: dir
CLOSE	C:\usr\local\apache2\htdocs\	SUCCESS	
OPEN	C:\usr\local\apache2\htdocs\dir\	SUCCESS	Options: Open
Directory  Access: 00100001
DIRECTORY	C:\usr\local\apache2\htdocs\dir\	SUCCESS
FileBothDirectoryInformation: subdir
CLOSE	C:\usr\local\apache2\htdocs\dir\	SUCCESS	
OPEN	C:\usr\local\apache2\htdocs\dir\subdir\	SUCCESS	Options: Open
Directory  Access: 00100001
DIRECTORY	C:\usr\local\apache2\htdocs\dir\subdir\	SUCCESS
FileBothDirectoryInformation: file.html
CLOSE	C:\usr\local\apache2\htdocs\dir\subdir\	SUCCESS	

The following file I/O is from
ap_run_map_to_storage
 core_map_to_storage
  ap_directory_walk

OPEN	C:\usr\local\apache2\htdocs\dir\subdir\file.html	SUCCESS
Options: Open  Access: 00100080
QUERY INFORMATION	C:\usr\local\apache2\htdocs\dir\subdir\file.html
BUFFER OVERFLOW	FileFsVolumeInformation
QUERY INFORMATION	C:\usr\local\apache2\htdocs\dir\subdir\file.html
BUFFER OVERFLOW	FileAllInformation
CLOSE	C:\usr\local\apache2\htdocs\dir\subdir\file.html	SUCCESS	
OPEN	C:\usr\local\apache2\htdocs\.htaccess	SUCCESS	Options: Open
Access: Read
QUERY INFORMATION	C:\usr\local\apache2\htdocs\.htaccess	BUFFER
OVERFLOW	FileFsVolumeInformation
QUERY INFORMATION	C:\usr\local\apache2\htdocs\.htaccess	BUFFER
OVERFLOW	FileAllInformation
READ 	C:\usr\local\apache2\htdocs\.htaccess	SUCCESS	Offset: 0 Length:
4096
READ	C:\usr\local\apache2\htdocs\.htaccess	END OF FILE	Offset: 63
Length: 4096
READ	C:\usr\local\apache2\htdocs\.htaccess	END OF FILE	Offset: 63
Length: 4096
CLOSE	C:\usr\local\apache2\htdocs\.htaccess	SUCCESS	
OPEN	C:\usr\local\apache2\htdocs\dir\.htaccess	NOT FOUND
Options: Open  Access: Read
OPEN	C:\usr\local\apache2\htdocs\dir\subdir\.htaccess	NOT FOUND
Options: Open  Access: Read
OPEN	C:\usr\local\apache2\htdocs\dir\subdir\	SUCCESS	Options: Open
Directory  Access: 00100001
DIRECTORY	C:\usr\local\apache2\htdocs\dir\subdir\	SUCCESS
FileBothDirectoryInformation: file.html
CLOSE	C:\usr\local\apache2\htdocs\dir\subdir\	SUCCESS	

The following file I/O is from
core default_handler and subsequent clean up

OPEN	C:\usr\local\apache2\htdocs\dir\subdir\file.html	SUCCESS
Options: Open  Access: Read
READ 	C:\usr\local\apache2\htdocs\dir\subdir\file.html	SUCCESS
Offset: 0 Length: 60
LOCK	C:\Apache22\logs\access.log	SUCCESS	Excl: Yes Offset: 0 Length:
-1
QUERY INFORMATION	C:\Apache22\logs\access.log	SUCCESS	Length: 8386
WRITE 	C:\Apache22\logs\access.log	SUCCESS	Offset: 8386 Length: 88
UNLOCK	C:\Apache22\logs\access.log	SUCCESS	Offset: 0 Length: -1
CLOSE	C:\usr\local\apache2\htdocs\dir\subdir\file.html	SUCCESS	

Re: patch for spurious open attempt on ".../file.html/.htaccess"

Posted by Ruediger Pluem <rp...@apache.org>.

On 06/13/2007 02:09 PM, Allen Pulsifer wrote:

Thanks for the patch. Could you please provide it as a unified diff against trunk?
Believe it or not but I guess that is easier to read for many (most?) people here
than (this is old code / this is new code) :-).

> server/request.c lines 930 to 940 currently read:
> 
>             if (r->finfo.filetype
> #ifdef CASE_BLIND_FILESYSTEM
>                 && (filename_len <= canonical_len)
> #endif
>                 && ((opts.opts & (OPT_SYM_OWNER | OPT_SYM_LINKS)) ==
> OPT_SYM_LINKS))
>             {
> 
>                 thisinfo.filetype = APR_DIR;
>                 ++seg;
>                 continue;
>             }
> 
> 
> Here is the corrected code that fixes the spurious open attempt on the file
> ".../file.html/.htaccess":
> 
>             if (r->finfo.filetype
> #ifdef CASE_BLIND_FILESYSTEM
>                 && (filename_len <= canonical_len)
> #endif
>                 && (opts.opts & OPT_SYM_LINKS) )
>             {
>                 ++seg;
> 
>                 if (r->finfo.filetype == APR_REG
>                     && (!r->path_info || !*r->path_info) )

Possible stupid question, but why do we need this path_info check?

>                 {
> 	              thisinfo.filetype = APR_REG;
>                     break;
>                 }
> 
>                 thisinfo.filetype = APR_DIR;
>                 continue;
>             }
> 

Regards

RĂ¼diger