You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Ken Parzygnat <kp...@raleigh.ibm.com> on 1998/10/15 00:17:25 UTC

Canonical filename overhaul

ATTENTION: LONG POST!!!!

Time to stir up this hornets nest again.

I've been working on Apache PR's for Win32 for
a short time, but I've found that about 90% of
the problems are boiling down to a bug in
ap_os_canonical_filename.

There are still problems lying around that
relate to this, and there are still more to 
come.  For example, the <Files> directive seems
to be completely broken on Win32.

I've spent the last few days digging through
the new-http archives, and looking at the code
to see what the problems have been, and what
the discussions have been.  I really believe
that ap_os_canonical_filename is far more
complicated than it needs to be.  Therefore,
I've decided to overhaul the entire process
and what I have so far looks like it may work. 

One thing to keep in mind is that the UNIX
ap_os_canonical_filename is a no-op.  That is,
it returns exactly what it gets.  This is 
partly why I believe the Win32 version needs
to be very simple, and only do what it minimally
needs to do to transform a path into a 
well known format.

Part of my testing involved carving out
ap_os_canonical_filename and placing it in a 
test program so I could examine its inputs and
outputs.  Here is a sample of some areas which I feel 
the routine is not behaving as it should.

Here are some of the results:
(The Labeling here is as follows:
  "Path: [path]" means 'path' was the input
  "Old : [path]" means 'path' was output from the
                 existing ap_os_canonical_filename)
                

Path: [*.perl]
Old : [f:/work/canon/debug/*.perl]

This really seems wrong to me.  ap_os_canonical_filename
has defaulted the current drive and current path.  This
is the primary reason that the <Files> directive is broken
on Win32.  In fact, the current drive and current path
are defaulted onto any string which is not a UNC name.  It
seems that we should never do this, we should only be 
getting default drive and path information from ServerRoot
or DocumentRoot.  

Path: [*.PERL]
Old : [f:/work/canon/debug/*.PERL]

This looks a lot like the last one, but you'll notice that
the case is still upper case.  This is because 
ap_os_canonical_filename could not find this file, so it
didn't change the case.  Again, this breaks cases were we
are looking for matches such as the <Files> directive.

Path: [f:\apache\htdocs\..\Welcome.html]
Old : [f:/apache/htdocs/welcome.html]

This does not seem correct to me because we have eliminated
the valid path component '..'.  We should leave this for calls
to ap_getparents.  

Path: [//Apache/HTdocs]
Old : [//Apache/HTdocs\]

Path: [//Grumble////hello//world]
Old : [//Grumble////hello\//world]

It's pretty self evident what has messed up here.

Path: [hello.html]
Old : [f:/work/canon/debug/hello.html]

Path: [f:]
Old : [f:/work/canon/debug/*]

Path: [/]
Old : [f:/]

Path: [*]
Old : [f:/work/canon/debug/*]

More assumptions of drives and paths.

Path: [f:\APACHE\HTDOCS1\Welcome.html]
Old : [f:/apache/HTDOCS1/Welcome.html]

Path: [f:\APACHE\HTDOCS\Welcome.html]
Old : [f:/apache/htdocs/welcome.html]

In the above two, you'll notice that one version
got lower cased, while the other partly did.  This
is because f:\apache\htdocs\Welcome.html exists
on my system. However, there is no f:\apache\htdocs1
directory.  Seems to me that we want to be sure
when we are lowercasing and when we are not.


Path: [.\]
Old : [ASSERT ERROR!!!!!!!]

Path: [...\]
Old : [ASSERT ERROR!!!!!!!

Path: [...\.]
Old : [ASSERT ERROR!!!!!!!]

Path: [..]
Old : [ASSERT ERROR!!!!!!!]

These are just a few of the cases that just
assert with the current routine.  The list
goes on.  Granted some are handled because
we call ap_getparents before the call to this
routine.  However, I wouldn't want to bet
that everything is caught.



OK, catch your breath....


Knowing that the Unix version of 
ap_os_canonical_filename is a no-op means that
there is lots of code in the server to do basic
path crunching (i.e. '..' and '.' and '///' etc).
So, it seems that we only need ap_os_canonical_filename
to change the path into a format 
that the existing code can deal with, and accomplish this
by doing as little as possible.  We want to try to 
assume as little as possible, because certainly
/foo/bar/welcome.html is valid on Win32.

So why not just lowercase everything and change all
'\' to '/'?  Because you still have the case issue
lurking for things like PATH_INFO.  And for PATH_INFO
you want to be sure that you preserve the case that
you are given.  Therefore, in my overhaul, I'm proposing
two functions:
ap_os_canonical_filename
ap_os_case_canonical_filename

ap_os_case_canonical_filename takes a path and formats
it without changing any case of the input components.

ap_os_canonical_filename does the same as above, but
calls strlwr on the output string.

The only place I've found that needs 
ap_os_case_canonical_filename is in directory_walk.  I'll
discuss that later.

So this is what I think ap_os_case_canonical_filename needs
to do:
1. Change '\' to '/'
2. Eliminate trailing '.' characters.  I felt this was
   important as a security issue since Win32 will ignore
   a trailing '.'.
3. Eliminate directories with more than 2 '.'s.  While on
   Unix it is valid to create a directory '...', it is 
   not on Win32.  This is a very suspect directory, and
   therefore, as a security issue, I've opted to eliminate
   it.  This could be up for discussion.
   
That's it!  This eliminated the recursive calls, calls
to Win32 functions like GetFullPathName and FindFirstFile
that exist in sub_canonical_filename; in fact there is
no more sub_canonical_filename.  I believe this greatly 
simplifies that entire code path.

Now, let me rehash the output above, but include
the new routine output:

Path: [*.perl]
Old : [f:/work/canon/debug/*.perl]
ap_os_case_canonical_filename : [*.perl]
ap_os_canonical_filename      : [*.perl]

Path: [*.PERL]
Old : [f:/work/canon/debug/*.PERL]
ap_os_case_canonical_filename : [*.PERL]
ap_os_canonical_filename      : [*.perl]

Notice the new routine does not add a default
drive and path.  When parsing the <Files> directive,
the server uses ap_os_canonical_filename so it will
always get lower case.

Path: [f:\apache\htdocs\..\Welcome.html]
Old : [f:/apache/htdocs/welcome.html]
ap_os_case_canonical_filename : [f:/apache/htdocs/../Welcome.html]
ap_os_canonical_filename      : [f:/apache/htdocs/../welcome.html]

The '..' is preserved because it is valid.

Path: [//Apache/HTdocs]
Old : [//Apache/HTdocs\]
ap_os_case_canonical_filename : [//Apache/HTdocs]
ap_os_canonical_filename      : [//apache/htdocs]

Path: [//Grumble////hello//world]
Old : [//Grumble////hello\//world]
ap_os_case_canonical_filename : [//Grumble////hello//world]
ap_os_canonical_filename      : [//grumble////hello//world]

Since we are simpler, we don't get hung up on more complicated
cases.

Path: [hello.html]
Old : [f:/work/canon/debug/hello.html]
ap_os_case_canonical_filename : [hello.html]
ap_os_canonical_filename      : [hello.html]

Path: [f:]
Old : [f:/work/canon/debug/*]
ap_os_case_canonical_filename : [f:]
ap_os_canonical_filename      : [f:]

Path: [/]
Old : [f:/]
ap_os_case_canonical_filename : [/]
ap_os_canonical_filename      : [/]

Path: [*]
Old : [f:/work/canon/debug/*]
ap_os_case_canonical_filename : [*]
ap_os_canonical_filename      : [*]

No assumptions of drives and paths.

Path: [f:\APACHE\HTDOCS1\Welcome.html]
Old : [f:/apache/HTDOCS1/Welcome.html]
ap_os_case_canonical_filename : [f:/APACHE/HTDOCS1/Welcome.html]
ap_os_canonical_filename      : [f:/apache/htdocs1/welcome.html]

Path: [f:\APACHE\HTDOCS\Welcome.html]
Old : [f:/apache/htdocs/welcome.html]
ap_os_case_canonical_filename : [f:/APACHE/HTDOCS/Welcome.html]
ap_os_canonical_filename      : [f:/apache/htdocs/welcome.html]

We always get predictable output.


Path: [f:\apache...\htdocs..\NoDocs.....\World..\Hi.html]
Old : [f:/apache/htdocs/NoDocs/World/Hi.html]
ap_os_case_canonical_filename : [f:/apache/htdocs/NoDocs/World/Hi.html]
ap_os_canonical_filename      : [f:/apache/htdocs/nodocs/world/hi.html]

Path: [f:\apache.\htdocs.\NoDocs.\World.\Hi.html]
Old : [f:/apache/htdocs/NoDocs/World/Hi.html]
ap_os_case_canonical_filename : [f:/apache/htdocs/NoDocs/World/Hi.html]
ap_os_canonical_filename      : [f:/apache/htdocs/nodocs/world/hi.html]

Path: [f:\apache\htdocs\Welcome.html....]
Old : [f:/apache/htdocs/welcome.html]
ap_os_case_canonical_filename : [f:/apache/htdocs/Welcome.html]
ap_os_canonical_filename      : [f:/apache/htdocs/welcome.html]

Path: [f:\apache\...\htdocs\..\Welcome.html]
Old : [f:/apache/htdocs/welcome.html]
ap_os_case_canonical_filename : [f:/apache/htdocs/../Welcome.html]
ap_os_canonical_filename      : [f:/apache/htdocs/../welcome.html]

Just wanted to demonstrate the elimination of trailing '.'

Path: [.\]
Old : [ASSERT ERROR!!!!!!!]
ap_os_case_canonical_filename : [./]
ap_os_canonical_filename      : [./]

Path: [...\]
Old : [ASSERT ERROR!!!!!!!]
ap_os_case_canonical_filename : [/]
ap_os_canonical_filename      : [/]

Path: [...\.]
Old : [ASSERT ERROR!!!!!!!]
ap_os_case_canonical_filename : [/]
ap_os_canonical_filename      : [/]

Path: [...]
Old : [ASSERT ERROR!!!!!!!]
ap_os_case_canonical_filename : []
ap_os_canonical_filename      : []

Path: [..]
Old : [ASSERT ERROR!!!!!!!]
ap_os_case_canonical_filename : [..]
ap_os_canonical_filename      : [..]


I'll put the patches into a later
post since this one is getting so big.  However,
I did want to bring up the one change I 
made to http_request.c since it is the only
change that will affect the UNIX code path.  Here's the patch:

--- http_request.c.first	Tue Oct 06 19:12:24 1998
+++ http_request.c	Tue Oct 13 21:57:14 1998
@@ -359,16 +359,19 @@
         return OK;
     }
 
-    r->filename   = ap_os_canonical_filename(r->pool, r->filename);
-    test_filename = ap_pstrdup(r->pool, r->filename);
-
-    ap_no2slash(test_filename);
-    num_dirs = ap_count_dirs(test_filename);
+    r->filename   = ap_os_case_canonical_filename(r->pool, r->filename);
 
     res = get_path_info(r);
     if (res != OK) {
         return res;
     }
+
+    r->filename   = ap_os_canonical_filename(r->pool, r->filename);
+
+    test_filename = ap_pstrdup(r->pool, r->filename);
+
+    ap_no2slash(test_filename);
+    num_dirs = ap_count_dirs(test_filename);
 
     if ((res = check_safe_file(r))) {
         return res;

This is in directory_walk.  The existing code calls 
ap_os_canonical_filename then get_path_info.  PATH_INFO
should maintain the case that is passed in on the URI.
The only reason it works today is because 
ap_os_canonical_filename does not find the path, and
therefore does not change the case.  I've modified this
to be a little more explicit.  I call 
ap_os_case_canonical_filename which will maintain case, 
then call get_path_info.  After that, I call 
ap_os_canonical_filename which will put it in the known form
that server will use internally.  Remember, both these
calls are no-ops on Unix. 

I've tested the code on my NT box and ran simple GETs, 
CGIs, Alias directive tests, and Files directive tests, and
everything seems to be working fine.  Now I need for someone
else to look at all of this and see if there is a hole.

Ok, that's it.  I'll be happy to post my patches as soon
as someone wants, but I wanted to give you all the 
information I had so you could chew on it.

Thanks for taking time.
- - - - - - - - - - - - - - - - - -
Ken Parzygnat
email: kparz@raleigh.ibm.com

RE: [PATCH] RE: Canonical filename overhaul

Posted by Ken Parzygnat <kp...@raleigh.ibm.com>.

I haven't heard any great gagging sounds from
this patch.  Could be good news, could be bad
news. :-)

Has anybody tried this, or have any comments 
on this proposed change?  Are there any concerns
about committing it?

- - - - - - - - - - - - - - - - - -
Ken Parzygnat
email: kparz@raleigh.ibm.com

RE: [PATCH] RE: Canonical filename overhaul

Posted by Ken Parzygnat <kp...@raleigh.ibm.com>.

> > 
> > BTW, wouldn't it make more sense to return NULL if szFile is NULL?
> 
> Probably.  I've updated this too.  I was originally thinking that we
> needed to return some string.  This sort of falls under your next
> comment.
> 
After looking at this a little more, there are places in the core code that
don't check for a NULL return from ap_os_canonical_filename.  Therefore,
it probably is safer to return an empty string for now.

> > 
> > Double-BTW, there isn't a defined way for ap_os_canonical_filename to
> > return an error. Do we want one?
> > 
> I'm back and forth on this.  In one respect it would be nice to have error
> codes to indicate things like NULL parameters and filenames that are way
> too long.  But on the other hand, I think that ap_os_canonical_filename
> should make the best attempt it can at returning something since we don't
> want to make too many assumptions about the input.  I'm really not sold
> either way.
If we decide to do this, this should probably go in on its own patch since
we will have to change core code to look for error codes. 

So, anyone against committing this patch?  Any +1's?


- - - - - - - - - - - - - - - - - -
Ken Parzygnat
email: kparz@raleigh.ibm.com

RE: [PATCH] RE: Canonical filename overhaul

Posted by Ken Parzygnat <kp...@raleigh.ibm.com>.

> > 
> > > Ok, here's the patch for you to play with.  This patch is what
> > > was previously discussed with one addition.  I heard through
> > > the grapevine (credit Marc) that one problem was related to
> > > use of alias names (the loveable "blah~1" type names).  Therefore,
> > > I had another piece of code up my sleeve to address another problem,
> > > but I'm putting it on the table now because it will solve this problem.
> > 
> > This doesn't want to apply for me; the util_win32.c chunks fail
> > completely.
> > 
> > Could you either see if that may be something on your end or just send the
> > whole util_win32.c file?
> > 
> > A few notes:
> > 
> > Get rid of the ap_assert()s.  There is no need for them and it is bad
> > coding practice to use them for "normal" error conditions.  Yea, I
> > know, they are a carryover from the existing code but it is broken.
> 
> There's only one mentioned in the patch that I can see (on the + side),
> and that's:
> 
> +    ap_assert(szFile != NULL);
> +    if (szFile == NULL || strlen(szFile) == 0)
> +        return ap_pstrdup(pPool, "");
> 
> which is either correct, if we don't think you should pass a NULL to
> ap_os_canonical_filename, in which case the following if is wrong, or
> incorrect if the following if is correct.

Yep.  I was thinking that ap_assert was only in affect on debug
builds.  I've changed this in the last repost.

> 
> BTW, wouldn't it make more sense to return NULL if szFile is NULL?

Probably.  I've updated this too.  I was originally thinking that we
needed to return some string.  This sort of falls under your next
comment.

> 
> Double-BTW, there isn't a defined way for ap_os_canonical_filename to
> return an error. Do we want one?
> 
I'm back and forth on this.  In one respect it would be nice to have error
codes to indicate things like NULL parameters and filenames that are way
too long.  But on the other hand, I think that ap_os_canonical_filename
should make the best attempt it can at returning something since we don't
want to make too many assumptions about the input.  I'm really not sold
either way.

Thanks for your comments Ben!
- - - - - - - - - - - - - - - - - -
Ken Parzygnat
email: kparz@raleigh.ibm.com

Re: [PATCH] RE: Canonical filename overhaul

Posted by Ben Laurie <be...@algroup.co.uk>.

Marc Slemko wrote:
> 
> On Mon, 19 Oct 1998, Ken Parzygnat wrote:
> 
> > Ok, here's the patch for you to play with.  This patch is what
> > was previously discussed with one addition.  I heard through
> > the grapevine (credit Marc) that one problem was related to
> > use of alias names (the loveable "blah~1" type names).  Therefore,
> > I had another piece of code up my sleeve to address another problem,
> > but I'm putting it on the table now because it will solve this problem.
> 
> This doesn't want to apply for me; the util_win32.c chunks fail
> completely.
> 
> Could you either see if that may be something on your end or just send the
> whole util_win32.c file?
> 
> A few notes:
> 
> Get rid of the ap_assert()s.  There is no need for them and it is bad
> coding practice to use them for "normal" error conditions.  Yea, I
> know, they are a carryover from the existing code but it is broken.

There's only one mentioned in the patch that I can see (on the + side),
and that's:

+    ap_assert(szFile != NULL);
+    if (szFile == NULL || strlen(szFile) == 0)
+        return ap_pstrdup(pPool, "");

which is either correct, if we don't think you should pass a NULL to
ap_os_canonical_filename, in which case the following if is wrong, or
incorrect if the following if is correct.

BTW, wouldn't it make more sense to return NULL if szFile is NULL?

Double-BTW, there isn't a defined way for ap_os_canonical_filename to
return an error. Do we want one?

Cheers,

Ben.

-- 
Ben Laurie            |Phone: +44 (181) 735 0686| Apache Group member
Freelance Consultant  |Fax:   +44 (181) 735 0689|http://www.apache.org/
and Technical Director|Email: ben@algroup.co.uk |
A.L. Digital Ltd,     |Apache-SSL author     http://www.apache-ssl.org/
London, England.      |"Apache: TDG" http://www.ora.com/catalog/apache/

RE: [PATCH] RE: Canonical filename overhaul

Posted by Ken Parzygnat <kp...@raleigh.ibm.com>.

> > Ok, here's the patch for you to play with.  This patch is what
> > was previously discussed with one addition.  I heard through
> > the grapevine (credit Marc) that one problem was related to 
> > use of alias names (the loveable "blah~1" type names).  Therefore,
> > I had another piece of code up my sleeve to address another problem,
> > but I'm putting it on the table now because it will solve this problem.
> 
> This doesn't want to apply for me; the util_win32.c chunks fail
> completely.  
> 
> Could you either see if that may be something on your end or just send the
> whole util_win32.c file?

I think if you tell patch to ignore whitespace, it will work.  In my version
of patch the option is --ignore-whitespace.  However, since I'm just about
replacing the entire file, I'll post the whole file here.  If anything, it'll
be easier to read here.

> 
> A few notes:
> 
> Get rid of the ap_assert()s.  There is no need for them and it is bad 
> coding practice to use them for "normal" error conditions.  Yea, I 
> know, they are a carryover from the existing code but it is broken.

OK.  The code below has them removed.  I was coding under the
false assumption that asserts only had affect on debug builds.
I didn't remove them from sections of code I did not modify, 
do you want those blown away too?

> 
> Are you sure you want to return "" if the arg is too long for
> the buffer in ap_os_systemcase_filename?  
> 
Well, if the name is too long for the buffer, the name is over
8K, so I'm betting it's bogus.  Maybe returning NULL is a 
better idea?  This sort of falls under Ben's comment about
whether or not we should return error codes.

> second, what about trailing spaces?  In some cases, opening "file " will
> give you "file".  I don't think we deal with it now, but that doesn't mean
> we shouldn't. 
Ah-ha! Good catch.  I've updated the code to remove trailing spaces
from the filename.  It is reflected below. Thanks.

Here's the code, with modifications noted by you and Ben.
I also made a change to the header where the two new routines are
defined.  I don't want to break other OS's that have 
HAVE_CANONICAL_FILENAME defined (e.g. OS/2).
I've included the patch files for httpd.h and http_request.c, and
I copied in util_win32.c wholesale.

Thanks for taking time to look at this.

--- .\include\httpd.h.orig	Wed Oct 28 20:12:18 1998
+++ .\include\httpd.h	Mon Nov 02 16:01:27 1998
@@ -998,8 +998,17 @@

 #ifndef HAVE_CANONICAL_FILENAME
 #define ap_os_canonical_filename(p,f)  (f)
+#define ap_os_case_canonical_filename(p,f)  (f)
+#define ap_os_systemcase_filename(p,f)  (f)
 #else
 API_EXPORT(char *) ap_os_canonical_filename(pool *p, const char *file);
+#ifdef WIN32
+API_EXPORT(char *) ap_os_case_canonical_filename(pool *pPool, const char *szFile);
+API_EXPORT(char *) ap_os_systemcase_filename(pool *pPool, const char *szFile);
+#else
+#define ap_os_case_canonical_filename(p,f) ap_os_canonical_filename(p,f)
+#define ap_os_systemcase_filename(p,f) ap_os_canonical_filename(p,f)
+#endif
 #endif

 #ifdef _OSD_POSIX
--- .\main\http_request.c.orig	Tue Oct 20 19:12:26 1998
+++ .\main\http_request.c	Mon Nov 02 14:30:23 1998
@@ -359,16 +359,19 @@
         return OK;
     }

-    r->filename   = ap_os_canonical_filename(r->pool, r->filename);
-    test_filename = ap_pstrdup(r->pool, r->filename);
-
-    ap_no2slash(test_filename);
-    num_dirs = ap_count_dirs(test_filename);
+    r->filename   = ap_os_case_canonical_filename(r->pool, r->filename);

     res = get_path_info(r);
     if (res != OK) {
         return res;
     }
+
+    r->filename   = ap_os_canonical_filename(r->pool, r->filename);
+
+    test_filename = ap_pstrdup(r->pool, r->filename);
+
+    ap_no2slash(test_filename);
+    num_dirs = ap_count_dirs(test_filename);

     if ((res = check_safe_file(r))) {
         return res;



============== UTIL_WIN32.C =====================
#include <windows.h>
#include <sys/stat.h>
#include <stdarg.h>

#include "httpd.h"
#include "http_log.h"

/* Returns TRUE if the input string is a string
 * of one or more '.' characters.
 */
static BOOL OnlyDots(char *pString)
{
    char *c;

    if (*pString == '\0')
        return FALSE;

    for (c = pString;*c;c++)
        if (*c != '.')
            return FALSE;

    return TRUE;
}

/* Accepts as input a pathname, and tries to match it to an
 * existing path and return the pathname in the case that
 * is present on the existing path.  This routine also
 * converts alias names to long names.
 */
API_EXPORT(char *) ap_os_systemcase_filename(pool *pPool,
                                             const char *szFile)
{
    char buf[HUGE_STRING_LEN];
    char *pInputName;
    char *p, *q;
    BOOL bDone = FALSE;
    BOOL bFileExists = TRUE;
    HANDLE hFind;
    WIN32_FIND_DATA wfd;

    if (!szFile || strlen(szFile) >= sizeof(buf))
        return NULL;

    if (strlen(szFile) == 0)
        return ap_pstrdup(pPool, "");

    buf[0] = '\0';
    pInputName = ap_pstrdup(pPool, szFile);

    /* First convert all slashes to \ so Win32 calls work OK */
    for (p = pInputName; *p; p++) {
        if (*p == '/')
            *p = '\\';
    }

    p = pInputName;
    /* If there is drive information, copy it over. */
    if (pInputName[1] == ':') {
        buf[0] = tolower(*p++);
        buf[1] = *p++;
        buf[2] = '\0';

        /* If all we have is a drive letter, then we are done */
        if (strlen(pInputName) == 2)
            bDone = TRUE;
    }

    q = p;
    if (*p == '\\') {
        p++;
        if (*p == '\\')  /* Possible UNC name */
        {
            p++;
            /* Get past the machine name.  FindFirstFile */
            /* will not find a machine name only */
            p = strchr(p, '\\');
            if (p)
            {
                p++;
                /* Get past the share name.  FindFirstFile */
                /* will not find a \\machine\share name only */
                p = strchr(p, '\\');
                if (p) {
                    strncat(buf,q,p-q);
                    q = p;
                    p++;
                }
            }

            if (!p)
                p = q;
        }
    }

    p = strchr(p, '\\');

    while (!bDone) {
        if (p)
            *p = '\0';

        if (strchr(q, '*') || strchr(q, '?'))
            bFileExists = FALSE;

        /* If the path exists so far, call FindFirstFile
         * again.  However, if this portion of the path contains
         * only '.' charaters, skip the call to FindFirstFile
         * since it will convert '.' and '..' to actual names.
         * Note: in the call to OnlyDots, we may have to skip
         *       a leading slash.
         */
        if (bFileExists && !OnlyDots((*q == '.' ? q : q+1))) {
            hFind = FindFirstFile(pInputName, &wfd);

            if (hFind == INVALID_HANDLE_VALUE) {
                bFileExists = FALSE;
            }
            else {
                FindClose(hFind);

                if (*q == '\\')
                    strcat(buf,"\\");
                strcat(buf, wfd.cFileName);
            }
        }

        if (!bFileExists || OnlyDots((*q == '.' ? q : q+1))) {
            strcat(buf, q);
        }

        if (p) {
            q = p;
            *p++ = '\\';
            p = strchr(p, '\\');
        }
        else {
            bDone = TRUE;
        }
    }

    /* First convert all slashes to / so server code handles it ok */
    for (p = buf; *p; p++) {
        if (*p == '\\')
            *p = '/';
    }

    return ap_pstrdup(pPool, buf);
}


/*  Perform canonicalization with the exception that the
 *  input case is preserved.
 */
API_EXPORT(char *) ap_os_case_canonical_filename(pool *pPool,
                                                 const char *szFile)
{
    char *pNewStr;
    char *s;
    char *p;
    char *q;

    if (szFile == NULL)
        return NULL;

    if (strlen(szFile) == 0)
        return ap_pstrdup(pPool, "");

    pNewStr = ap_pstrdup(pPool, szFile);

    /*  Change all '\' characters to '/' characters.
     *  While doing this, remove any trailing '.'.
     *  Also, blow away any directories with 3 or
     *  more '.'
     */
    for (p = pNewStr,s = pNewStr; *s; s++,p++) {
        if (*s == '\\' || *s == '/') {

            q = p;
            while (p > pNewStr && *(p-1) == '.')
                p--;

            if (p == pNewStr && q-p <= 2 && *p == '.')
                p = q;
            else if (p > pNewStr && p < q && *(p-1) == '/') {
                if (q-p > 2)
                    p--;
                else
                    p = q;
            }

            *p = '/';
        }
        else {
            *p = *s;
        }
    }
    *p = '\0';

    /*  Blow away any final trailing '.' since on Win32
     *  foo.bat == foo.bat. == foo.bat... etc.
     *  Also blow away any trailing spaces since
     *  "filename" == "filename "
     */
    q = p;
    while (p > pNewStr && (*(p-1) == '.' || *(p-1) == ' '))
        p--;
    if ((p > pNewStr) ||
        (p == pNewStr && q-p > 2))
        *p = '\0';


    /*  One more security issue to deal with.  Win32 allows
     *  you to create long filenames.  However, alias filenames
     *  are always created so that the filename will
     *  conform to 8.3 rules.  According to the Microsoft
     *  Developer's network CD (1/98)
     *  "Automatically generated aliases are composed of the
     *   first six characters of the filename plus ~n
     *   (where n is a number) and the first three characters
     *   after the last period."
     *  Here, we attempt to detect and decode these names.
     */
    p = strchr(pNewStr, '~');
    if (p != NULL) {
        char *pConvertedName, *pQstr, *pPstr;
        char buf[HUGE_STRING_LEN];
        /* We potentially have a short name.  Call
         * ap_os_systemcase_filename to examine the filesystem
         * and possibly extract the long name.
         */
        pConvertedName = ap_os_systemcase_filename(pPool, pNewStr);

        /* Since we want to preserve the incoming case as much
         * as we can, compare for differences in the string and
         * only substitute in the path names that changed.
         */
        if (stricmp(pNewStr, pConvertedName)) {
            buf[0] = '\0';

            q = pQstr = pConvertedName;
            p = pPstr = pNewStr;
            do {
                q = strchr(q,'/');
                p = strchr(p,'/');

                if (p != NULL) {
                    *q = '\0';
                    *p = '\0';
                }

                if (stricmp(pQstr, pPstr))
                    strcat(buf, pQstr);   /* Converted name */
                else
                    strcat(buf, pPstr);   /* Original name  */


                if (p != NULL) {
                    pQstr = q;
                    pPstr = p;
                    *q++ = '/';
                    *p++ = '/';
                }

            } while (p != NULL);

            pNewStr = ap_pstrdup(pPool, buf);
        }
    }


    return pNewStr;
}

/*  Perform complete canonicalization.
 */
API_EXPORT(char *) ap_os_canonical_filename(pool *pPool, const char *szFile)
{
    char *pNewName;
    pNewName = ap_os_case_canonical_filename(pPool, szFile);
    strlwr(pNewName);
    return pNewName;
}

/* Win95 doesn't like trailing /s. NT and Unix don't mind. This works
 * around the problem.
 * Errr... except if it is UNC and we are referring to the root of
 * the UNC, we MUST have a trailing \ and we can't use /s. Jeez.
 * Not sure if this refers to all UNCs or just roots,
 * but I'm going to fix it for all cases for now. (Ben)
 */

#undef stat
API_EXPORT(int) os_stat(const char *szPath, struct stat *pStat)
{
    int n;

    if (strlen(szPath) == 0) {
        return -1;
    }

    if (szPath[0] == '/' && szPath[1] == '/') {
	char buf[_MAX_PATH];
	char *s;
	int nSlashes = 0;

	ap_assert(strlen(szPath) < _MAX_PATH);
	strcpy(buf, szPath);
	for (s = buf; *s; ++s) {
	    if (*s == '/') {
		*s = '\\';
		++nSlashes;
	    }
	}
	/* then we need to add one more to get \\machine\share\ */
	if (nSlashes == 3) {
	    *s++ = '\\';
	}
	*s = '\0';
	return stat(buf, pStat);
    }

    /*
     * Below removes the trailing /, however, do not remove
     * it in the case of 'x:/' or stat will fail
     */
    n = strlen(szPath);
    if ((szPath[n - 1] == '\\' || szPath[n - 1] == '/') &&
        !(n == 3 && szPath[1] == ':')) {
        char buf[_MAX_PATH];

        ap_assert(n < _MAX_PATH);
        strcpy(buf, szPath);
        buf[n - 1] = '\0';

        return stat(buf, pStat);
    }
    return stat(szPath, pStat);
}

/* Fix two really crap problems with Win32 spawn[lv]e*:
 *
 *  1. Win32 doesn't deal with spaces in argv.
 *  2. Win95 doesn't like / in cmdname.
 */

#undef _spawnv
API_EXPORT(int) os_spawnv(int mode, const char *cmdname,
			  const char *const *argv)
{
    int n;
    char **aszArgs;
    const char *szArg;
    char *szCmd;
    char *s;

    szCmd = _alloca(strlen(cmdname)+1);
    strcpy(szCmd, cmdname);
    for (s = szCmd; *s; ++s) {
        if (*s == '/') {
            *s = '\\';
	}
    }

    for (n = 0; argv[n]; ++n)
        ;

    aszArgs = _alloca((n + 1) * sizeof(const char *));

    for (n = 0; szArg = argv[n]; ++n) {
        if (strchr(szArg, ' ')) {
            int l = strlen(szArg);

            aszArgs[n] = _alloca(l + 2 + 1);
            aszArgs[n][0] = '"';
            strcpy(&aszArgs[n][1], szArg);
            aszArgs[n][l + 1] = '"';
            aszArgs[n][l + 2] = '\0';
        }
        else {
            aszArgs[n] = (char *)szArg;
        }
    }

    aszArgs[n] = NULL;

    return _spawnv(mode, szCmd, aszArgs);
}

#undef _spawnve
API_EXPORT(int) os_spawnve(int mode, const char *cmdname,
			   const char *const *argv, const char *const *envp)
{
    int n;
    char **aszArgs;
    const char *szArg;
    char *szCmd;
    char *s;

    szCmd = _alloca(strlen(cmdname)+1);
    strcpy(szCmd, cmdname);
    for (s = szCmd; *s; ++s) {
        if (*s == '/') {
            *s = '\\';
	}
    }

    for (n = 0; argv[n]; ++n)
        ;

    aszArgs = _alloca((n + 1)*sizeof(const char *));

    for (n = 0; szArg = argv[n]; ++n){
        if (strchr(szArg, ' ')) {
            int l = strlen(szArg);

            aszArgs[n] = _alloca(l + 2 + 1);
            aszArgs[n][0] = '"';
            strcpy(&aszArgs[n][1], szArg);
            aszArgs[n][l + 1] = '"';
            aszArgs[n][l + 2] = '\0';
        }
        else {
            aszArgs[n] = (char *)szArg;
        }
    }

    aszArgs[n] = NULL;

    return _spawnve(mode, szCmd, aszArgs, envp);
}

API_EXPORT(int) os_spawnle(int mode, const char *cmdname, ...)
{
    int n;
    va_list vlist;
    char **aszArgs;
    const char *szArg;
    const char *const *aszEnv;
    char *szCmd;
    char *s;

    szCmd = _alloca(strlen(cmdname)+1);
    strcpy(szCmd, cmdname);
    for (s = szCmd; *s; ++s) {
        if (*s == '/') {
            *s = '\\';
	}
    }

    va_start(vlist, cmdname);
    for (n = 0; va_arg(vlist, const char *); ++n)
        ;
    va_end(vlist);

    aszArgs = _alloca((n + 1) * sizeof(const char *));

    va_start(vlist, cmdname);
    for (n = 0; szArg = va_arg(vlist, const char *); ++n) {
        if (strchr(szArg, ' ')) {
            int l = strlen(szArg);

            aszArgs[n] = _alloca(l + 2 + 1);
            aszArgs[n][0] = '"';
            strcpy(&aszArgs[n][1], szArg);
            aszArgs[n][l + 1] = '"';
            aszArgs[n][l + 2] = '\0';
        }
        else {
            aszArgs[n] = (char *)szArg;
        }
    }

    aszArgs[n] = NULL;

    aszEnv = va_arg(vlist, const char *const *);
    va_end(vlist);

    return _spawnve(mode, szCmd, aszArgs, aszEnv);
}


- - - - - - - - - - - - - - - - - -
Ken Parzygnat
email: kparz@raleigh.ibm.com 






--------------------------------------------------------------------------------
Users of the Apache webserver are hereby granted a non-exclusive, irrevocable,
world-wide, royalty-free, non-transferable license to use, execute, prepare
derivative works of, and distribute (internally and externally, and including
derivative works) the code accompanying this license as part of, and
integrated into the Apache webserver.  This code is provided "AS IS" WITHOUT
WARRANTY OF ANY KIND EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO
THE IMPLIED WARRANTY OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
AND ANY WARRANTY OF NON-INFRINGEMENT.  THE ENTIRE RISK ARISING OUT OF THE USE
OR PERFORMANCE OF THIS CODE REMAINS WITH USERS OF THE APACHE WEBSERVER.  The
owner of this code represents and warrants that it is legally entitled to
grant the above license.

Re: [PATCH] RE: Canonical filename overhaul

Posted by Marc Slemko <ma...@worldgate.com>.

A few more comments:

first, looking over the test cases you posted, I like this new code even
more.  Thanks, this is the source of many bug reports and should be the
source of many more if people knew what to expect.

second, what about trailing spaces?  In some cases, opening "file " will
give you "file".  I don't think we deal with it now, but that doesn't mean
we shouldn't.

Re: [PATCH] RE: Canonical filename overhaul

Posted by Marc Slemko <ma...@worldgate.com>.

On Mon, 19 Oct 1998, Ken Parzygnat wrote:

> Ok, here's the patch for you to play with.  This patch is what
> was previously discussed with one addition.  I heard through
> the grapevine (credit Marc) that one problem was related to 
> use of alias names (the loveable "blah~1" type names).  Therefore,
> I had another piece of code up my sleeve to address another problem,
> but I'm putting it on the table now because it will solve this problem.

This doesn't want to apply for me; the util_win32.c chunks fail
completely.  

Could you either see if that may be something on your end or just send the
whole util_win32.c file?

A few notes:

Get rid of the ap_assert()s.  There is no need for them and it is bad 
coding practice to use them for "normal" error conditions.  Yea, I 
know, they are a carryover from the existing code but it is broken.

Are you sure you want to return "" if the arg is too long for
the buffer in ap_os_systemcase_filename?  

Other than that, the concept looks reasonable and if it actually works
all the better...  I can't say I fully understand all the issues on 
Win32 since it is such a pain, but...  I just wish there were some 
documented way to compare file paths on Win32.  I'm sure there are
more magic cases that we don't know about.

[PATCH] RE: Canonical filename overhaul

Posted by Ken Parzygnat <kp...@raleigh.ibm.com>.

Ok, here's the patch for you to play with.  This patch is what
was previously discussed with one addition.  I heard through
the grapevine (credit Marc) that one problem was related to 
use of alias names (the loveable "blah~1" type names).  Therefore,
I had another piece of code up my sleeve to address another problem,
but I'm putting it on the table now because it will solve this problem.

The additional function is called: ap_os_systemcase_filename.  
Given a pathname, this function will return the pathname in the
case that exists on the file system.  It also has the benefit of
decoding alias names.

I have another patch coming that I didn't want to include here
because this one is so big.  In testing, I tried to set up a 
<Directory> container to provide basic authentication for
a UNC name, and it failed.  The patch I will post next is
a slight modification to get_path_info and ap_no2slash
and will fix this problem.

There are patches for three files here:
http_request.c
httpd.h
util_win32.c
Diff made the changes look a little worse than they are, so it's 
hard to read here.


--- .\include\httpd.h.first	Wed Oct 07 13:12:20 1998
+++ .\include\httpd.h	Fri Oct 16 15:33:58 1998
@@ -998,8 +998,12 @@

 #ifndef HAVE_CANONICAL_FILENAME
 #define ap_os_canonical_filename(p,f)  (f)
+#define ap_os_case_canonical_filename(p,f)  (f)
+#define ap_os_systemcase_filename(p,f)  (f)
 #else
 API_EXPORT(char *) ap_os_canonical_filename(pool *p, const char *file);
+API_EXPORT(char *) ap_os_case_canonical_filename(pool *pPool, const char *szFile);
+API_EXPORT(char *) ap_os_systemcase_filename(pool *pPool, const char *szFile);
 #endif

 #ifdef _OSD_POSIX
--- .\main\http_request.c.first	Thu Oct 15 13:12:20 1998
+++ .\main\http_request.c	Fri Oct 16 15:20:38 1998
@@ -359,16 +359,19 @@
         return OK;
     }

-    r->filename   = ap_os_canonical_filename(r->pool, r->filename);
-    test_filename = ap_pstrdup(r->pool, r->filename);
-
-    ap_no2slash(test_filename);
-    num_dirs = ap_count_dirs(test_filename);
+    r->filename   = ap_os_case_canonical_filename(r->pool, r->filename);

     res = get_path_info(r);
     if (res != OK) {
         return res;
     }
+
+    r->filename   = ap_os_canonical_filename(r->pool, r->filename);
+
+    test_filename = ap_pstrdup(r->pool, r->filename);
+
+    ap_no2slash(test_filename);
+    num_dirs = ap_count_dirs(test_filename);

     if ((res = check_safe_file(r))) {
         return res;
--- .\os\win32\util_win32.c.first	Thu Oct 01 07:12:24 1998
+++ .\os\win32\util_win32.c	Fri Oct 16 15:32:08 1998
@@ -5,212 +5,273 @@
 #include "httpd.h"
 #include "http_log.h"

-/* Returns TRUE if the path is real, FALSE if it is PATH_INFO */
-static BOOL sub_canonical_filename(char *szCanon, unsigned nCanon,
-				   const char *szInFile)
+/* Returns TRUE if the input string is a string
+ * of one or more '.' characters.
+ */
+static BOOL OnlyDots(char *pString)
+{
+    char *c;
+
+    if (*pString == '\0')
+        return FALSE;
+
+    for (c = pString;*c;c++)
+        if (*c != '.')
+            return FALSE;
+
+    return TRUE;
+}
+
+/* Accepts as input a pathname, and tries to match it to an
+ * existing path and return the pathname in the case that
+ * is present on the existing path.  This routine also
+ * converts alias names to long names.
+ */
+API_EXPORT(char *) ap_os_systemcase_filename(pool *pPool,
+                                             const char *szFile)
 {
     char buf[HUGE_STRING_LEN];
-    int n;
-    char *szFilePart;
-    char *s;
-    int nSlashes;
-    WIN32_FIND_DATA d;
-    HANDLE h;
-    const char *szFile;
-
-    szFile = szInFile;
-    s = strrchr(szFile, '\\');
-    for (nSlashes = 0; s > szFile && s[-1] == '\\'; ++nSlashes, --s)
-	;
-
-    if (strlen(szFile)==2 && szFile[1]==':') {
-        /*
-         * If the file name is x:, do not call GetFullPathName
-         * because it will use the current path of the executable
+    char *pInputName;
+    char *p, *q;
+    BOOL bDone = FALSE;
+    BOOL bFileExists = TRUE;
+    HANDLE hFind;
+    WIN32_FIND_DATA wfd;
+
+    ap_assert(szFile && strlen(szFile) < sizeof buf);
+    if (!szFile || strlen(szFile) == 0 || strlen(szFile) >= sizeof(buf))
+        return ap_pstrdup(pPool, "");
+
+    buf[0] = '\0';
+    pInputName = ap_pstrdup(pPool, szFile);
+
+    /* First convert all slashes to \ so Win32 calls work OK */
+    for (p = pInputName; *p; p++) {
+        if (*p == '/')
+            *p = '\\';
+    }
+
+    p = pInputName;
+    /* If there is drive information, copy it over. */
+    if (pInputName[1] == ':') {
+        buf[0] = tolower(*p++);
+        buf[1] = *p++;
+        buf[2] = '\0';
+
+        /* If all we have is a drive letter, then we are done */
+        if (strlen(pInputName) == 2)
+            bDone = TRUE;
+    }
+
+    q = p;
+    if (*p == '\\') {
+        p++;
+        if (*p == '\\')  /* Possible UNC name */
+        {
+            p++;
+            /* Get past the machine name.  FindFirstFile */
+            /* will not find a machine name only */
+            p = strchr(p, '\\');
+            if (p)
+            {
+                p++;
+                /* Get past the share name.  FindFirstFile */
+                /* will not find a \\machine\share name only */
+                p = strchr(p, '\\');
+                if (p) {
+                    strncat(buf,q,p-q);
+                    q = p;
+                    p++;
+                }
+            }
+
+            if (!p)
+                p = q;
+        }
+    }
+
+    p = strchr(p, '\\');
+
+    while (!bDone) {
+        if (p)
+            *p = '\0';
+
+        if (strchr(q, '*') || strchr(q, '?'))
+            bFileExists = FALSE;
+
+        /* If the path exists so far, call FindFirstFile
+         * again.  However, if this portion of the path contains
+         * only '.' charaters, skip the call to FindFirstFile
+         * since it will convert '.' and '..' to actual names.
+         * Note: in the call to OnlyDots, we may have to skip
+         *       a leading slash.
          */
-        strcpy(buf,szFile);
-        n = strlen(buf);
-        szFilePart = buf + n;
-    }
-    else {
-        n = GetFullPathName(szFile, sizeof buf, buf, &szFilePart);
-    }
-    ap_assert(n);
-    ap_assert(n < sizeof buf);
-
-    /*
-     * There is an implicit assumption that szInFile will contain a '\'.
-     * If this is not true (as in the case of <Directory *> or
-     * <File .htaccess>) we would assert in some of the code below.  Therefore,
-     * if we don't get any '\' in the file name, then use the file name we get
-     * from GetFullPathName, because it will have at least one '\'.  If there
-     * is no '\' in szInFile, it must just be a file name, so it should be
-     * valid to use the name from GetFullPathName.  Be sure to adjust the
-     * 's' variable so the rest of the code functions normally.
-     * Note it is possible to get here when szFile == 'x:', but that is OK
-     * because we will bail out of this routine early.
-     */
-    if (!s) {
-        szFile = buf;
-        s = strrchr(szFile, '\\');
-    }
-
-    /* If we have \\machine\share, convert to \\machine\share\ */
-    if (buf[0] == '\\' && buf[1] == '\\') {
-	char *s = strchr(buf + 2, '\\');
-	if (s && !strchr(s + 1, '\\')) {
-	    strcat(s + 1, "\\");
-	}
-    }
-
-    if (!strchr(buf, '*') && !strchr(buf, '?')) {
-        h = FindFirstFile(buf, &d);
-        if (h != INVALID_HANDLE_VALUE) {
-            FindClose(h);
-	}
-    }
-    else {
-        h = INVALID_HANDLE_VALUE;
-    }
-
-    if (szFilePart < buf + 3) {
-	ap_assert(strlen(buf) < nCanon);
-        strcpy(szCanon, buf);
-	/* a \ at the start means it is UNC, otherwise it is x: */
-	if (szCanon[0] != '\\') {
-	    ap_assert(ap_isalpha(szCanon[0]));
-	    ap_assert(szCanon[1] == ':');
-	    szCanon[2] = '/';
-	}
-	else {
-	    char *s;
-
-	    ap_assert(szCanon[1] == '\\');
-	    for (s = szCanon; *s; ++s) {
-		if (*s == '\\') {
-		    *s = '/';
-		}
-	    }
-	}
-        return TRUE;
-    }
-    if (szFilePart != buf + 3) {
-        char b2[_MAX_PATH];
-	char b3[_MAX_PATH];
-        ap_assert(szFilePart > buf + 3);
-	/* avoid SEGVs on things like "Directory *" */
-	ap_assert(s >= szFile && "this is a known bug");
-
-	memcpy(b3, szFile, s - szFile);
-	b3[s - szFile] = '\0';
-
-/*        szFilePart[-1] = '\0'; */
-        sub_canonical_filename(b2, sizeof b2, b3);
-
-	ap_assert(strlen(b2)+1 < nCanon);
-        strcpy(szCanon, b2);
-        strcat(szCanon, "/");
-    }
-    else {
-	ap_assert(strlen(buf) < nCanon);
-        strcpy(szCanon, buf);
-        szCanon[2] = '/';
-        szCanon[3] = '\0';
-    }
-    if (h == INVALID_HANDLE_VALUE) {
-	ap_assert(strlen(szCanon) + strlen(szFilePart) + nSlashes < nCanon);
-	for (n = 0; n < nSlashes; ++n) {
-	    strcat(szCanon, "/");
-	}
-        strcat(szCanon, szFilePart);
-	return FALSE;
-    }
-    else {
-	ap_assert(strlen(szCanon)+strlen(d.cFileName) < nCanon);
-        strlwr(d.cFileName);
-        strcat(szCanon, d.cFileName);
-	return TRUE;
+        if (bFileExists && !OnlyDots((*q == '.' ? q : q+1))) {
+            hFind = FindFirstFile(pInputName, &wfd);
+
+            if (hFind == INVALID_HANDLE_VALUE) {
+                bFileExists = FALSE;
+            }
+            else {
+                FindClose(hFind);
+
+                if (*q == '\\')
+                    strcat(buf,"\\");
+                strcat(buf, wfd.cFileName);
+            }
+        }
+
+        if (!bFileExists || OnlyDots((*q == '.' ? q : q+1))) {
+            strcat(buf, q);
+        }
+
+        if (p) {
+            q = p;
+            *p++ = '\\';
+            p = strchr(p, '\\');
+        }
+        else {
+            bDone = TRUE;
+        }
+    }
+
+    /* First convert all slashes to / so server code handles it ok */
+    for (p = buf; *p; p++) {
+        if (*p == '\\')
+            *p = '/';
     }
+
+    return ap_pstrdup(pPool, buf);
 }

-/* UNC requires backslashes, hence the conversion before canonicalisation.
- * Not sure how * many backslashes (could be that
- * \\machine\share\some/path/is/ok for example). For now, do them all.
+
+/*  Perform canonicalization with the exception that the
+ *  input case is preserved.
  */
-API_EXPORT(char *) ap_os_canonical_filename(pool *pPool, const char *szFile)
+API_EXPORT(char *) ap_os_case_canonical_filename(pool *pPool,
+                                                 const char *szFile)
 {
-    char buf[HUGE_STRING_LEN];
-    char b2[HUGE_STRING_LEN];
-    const char *s;
-    char *d;
-    int nSlashes = 0;
-
-    ap_assert(strlen(szFile) < sizeof b2);
-
-    /* Eliminate directories consisting of three or more dots.
-     * These act like ".." but are not detected by other machinery.
-     * Also get rid of trailing .s on any path component, which are ignored
-     * by the filesystem.  Simultaneously, rewrite / to \.
-     * This is a bit of a kludge - Ben.
+    char *pNewStr;
+    char *s;
+    char *p;
+    char *q;
+
+    ap_assert(szFile != NULL);
+    if (szFile == NULL || strlen(szFile) == 0)
+        return ap_pstrdup(pPool, "");
+
+    pNewStr = ap_pstrdup(pPool, szFile);
+
+    /*  Change all '\' characters to '/' characters.
+     *  While doing this, remove any trailing '.'.
+     *  Also, blow away any directories with 3 or
+     *  more '.'
      */
-    if (strlen(szFile) == 1) {
-        /*
-         * If the file is only one char (like in the case of / or .) then
-	 * just pass that through to sub_canonical_filename.  Convert a
-	 * '/' to '\\' if necessary.
-         */
-        if (szFile[0] == '/') {
-            b2[0] = '\\';
-	}
-        else {
-            b2[0] = szFile[0];
-	}
+    for (p = pNewStr,s = pNewStr; *s; s++,p++) {
+        if (*s == '\\' || *s == '/') {
+
+            q = p;
+            while (p > pNewStr && *(p-1) == '.')
+                p--;
+
+            if (p == pNewStr && q-p <= 2 && *p == '.')
+                p = q;
+            else if (p > pNewStr && p < q && *(p-1) == '/') {
+                if (q-p > 2)
+                    p--;
+                else
+                    p = q;
+            }

-        b2[1] = '\0';
+            *p = '/';
+        }
+        else {
+            *p = *s;
+        }
     }
-    else {
-        for (s = szFile, d = b2; (*d = *s); ++d, ++s) {
-	    if (*s == '/') {
-		*d = '\\';
-	    }
-	    if (*s == '.' && (s[1] == '/' || s[1] == '\\' || !s[1])) {
-		while (*d == '.') {
-		    --d;
-		}
-		if (*d == '\\') {
-		    --d;
-		}
-	    }
-	}
-
-        /* Finally, a trailing slash(es) screws thing, so blow them away */
-        for (nSlashes = 0; d > b2 && d[-1] == '\\'; --d, ++nSlashes)
-	    ;
-        /* XXXX this breaks '/' and 'c:/' cases */
-        *d = '\0';
-    }
-    sub_canonical_filename(buf, sizeof buf, b2);
-
-    buf[0] = ap_tolower(buf[0]);
-
-    if (nSlashes) {
-        /*
-         * If there were additional trailing slashes, add them back on.
-         * Be sure not to add more than were originally there though,
-         * by checking to see if sub_canonical_filename added one;
-         * this could happen in cases where the file name is 'd:/'
+    *p = '\0';
+
+    /*  Blow away any final trailing '.' since on Win32
+     *  foo.bat == foo.bat. == foo.bat... etc.
+     */
+    q = p;
+    while (p > pNewStr && *(p-1) == '.')
+        p--;
+    if ((p > pNewStr) ||
+        (p == pNewStr && q-p > 2))
+        *p = '\0';
+
+
+    /*  One more security issue to deal with.  Win32 allows
+     *  you to create long filenames.  However, alias filenames
+     *  are always created so that the filename will
+     *  conform to 8.3 rules.  According to the Microsoft
+     *  Developer's network CD (1/98)
+     *  "Automatically generated aliases are composed of the
+     *   first six characters of the filename plus ~n
+     *   (where n is a number) and the first three characters
+     *   after the last period."
+     *  Here, we attempt to detect and decode these names.
+     */
+    p = strchr(pNewStr, '~');
+    if (p != NULL) {
+        char *pConvertedName, *pQstr, *pPstr;
+        char buf[HUGE_STRING_LEN];
+        /* We potentially have a short name.  Call
+         * ap_os_systemcase_filename to examine the filesystem
+         * and possibly extract the long name.
          */
-        ap_assert(strlen(buf)+nSlashes < sizeof buf);
+        pConvertedName = ap_os_systemcase_filename(pPool, pNewStr);

-        if (nSlashes && buf[strlen(buf)-1] == '/')
-            nSlashes--;
+        /* Since we want to preserve the incoming case as much
+         * as we can, compare for differences in the string and
+         * only substitute in the path names that changed.
+         */
+        if (stricmp(pNewStr, pConvertedName)) {
+            buf[0] = '\0';
+
+            q = pQstr = pConvertedName;
+            p = pPstr = pNewStr;
+            do {
+                q = strchr(q,'/');
+                p = strchr(p,'/');
+
+                if (p != NULL) {
+                    *q = '\0';
+                    *p = '\0';
+                }
+
+                if (stricmp(pQstr, pPstr))
+                    strcat(buf, pQstr);   /* Converted name */
+                else
+                    strcat(buf, pPstr);   /* Original name  */
+
+
+                if (p != NULL) {
+                    pQstr = q;
+                    pPstr = p;
+                    *q++ = '/';
+                    *p++ = '/';
+                }
+
+            } while (p != NULL);

-        while (nSlashes--) {
-            strcat(buf, "/");
+            pNewStr = ap_pstrdup(pPool, buf);
         }
     }

-    return ap_pstrdup(pPool, buf);
+
+    return pNewStr;
+}
+
+/*  Perform complete canonicalization.
+ */
+API_EXPORT(char *) ap_os_canonical_filename(pool *pPool, const char *szFile)
+{
+    char *pNewName;
+    pNewName = ap_os_case_canonical_filename(pPool, szFile);
+    strlwr(pNewName);
+    return pNewName;
 }

 /* Win95 doesn't like trailing /s. NT and Unix don't mind. This works
@@ -225,19 +286,12 @@
 API_EXPORT(int) os_stat(const char *szPath, struct stat *pStat)
 {
     int n;
-
-    /* be sure it is has a drive letter or is a UNC path; everything
-     * _must_ be canonicalized before getting to this point.
-     */
-    if (szPath[1] != ':' && szPath[1] != '/') {
-	ap_log_error(APLOG_MARK, APLOG_ERR, NULL,
-		     "Invalid path in os_stat: \"%s\", "
-		     "should have a drive letter or be a UNC path",
-		     szPath);
-	return (-1);
+
+    if (strlen(szPath) == 0) {
+        return -1;
     }

-    if (szPath[0] == '/') {
+    if (szPath[0] == '/' && szPath[1] == '/') {
 	char buf[_MAX_PATH];
 	char *s;
 	int nSlashes = 0;




- - - - - - - - - - - - - - - - - -
Ken Parzygnat
email: kparz@raleigh.ibm.com 










--------------------------------------------------------------------------------
Users of the Apache webserver are hereby granted a non-exclusive, irrevocable,
world-wide, royalty-free, non-transferable license to use, execute, prepare
derivative works of, and distribute (internally and externally, and including
derivative works) the code accompanying this license as part of, and
integrated into the Apache webserver.  This code is provided "AS IS" WITHOUT
WARRANTY OF ANY KIND EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO
THE IMPLIED WARRANTY OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
AND ANY WARRANTY OF NON-INFRINGEMENT.  THE ENTIRE RISK ARISING OUT OF THE USE
OR PERFORMANCE OF THIS CODE REMAINS WITH USERS OF THE APACHE WEBSERVER.  The
owner of this code represents and warrants that it is legally entitled to
grant the above license.

Re: Canonical filename overhaul

Posted by Marc Slemko <ma...@znep.com>.

On Wed, 14 Oct 1998, Ken Parzygnat wrote:

> ATTENTION: LONG POST!!!!
> 
> Time to stir up this hornets nest again.
> 
> I've been working on Apache PR's for Win32 for
> a short time, but I've found that about 90% of
> the problems are boiling down to a bug in
> ap_os_canonical_filename.

Yes.  Two million thanks and three million free copies of Apache go to
anyone who fixes this.

> 
> There are still problems lying around that
> relate to this, and there are still more to 
> come.  For example, the <Files> directive seems
> to be completely broken on Win32.

It does work sometimes when you have the luck to be in the right directory
using or not using the right wildcards.  But yea, it is broken.

[...]
> One thing to keep in mind is that the UNIX
> ap_os_canonical_filename is a no-op.  That is,
> it returns exactly what it gets.  This is 

You can argue that there are a couple of things that an
ap_os_canonical_filename should do on Unix if it were to do what it name
says; ie. handle "." and "..".  But, since those are resolved elsewhere
and have HTTP semantics it doesn't.

> partly why I believe the Win32 version needs
> to be very simple, and only do what it minimally
> needs to do to transform a path into a 
> well known format.

Right.  All we need is to be able to compare two names.  This seems like a
darn simple thing, but the last MS person I asked about it was astounded
by the concept and went off in wonder.

> Path: [*.perl]
> Old : [f:/work/canon/debug/*.perl]
> 
> This really seems wrong to me.  ap_os_canonical_filename
> has defaulted the current drive and current path.  This
> is the primary reason that the <Files> directive is broken
> on Win32.  In fact, the current drive and current path
> are defaulted onto any string which is not a UNC name.  It
> seems that we should never do this, we should only be 
> getting default drive and path information from ServerRoot
> or DocumentRoot.  

Right, nothing can depend on the current directory and we can (ie. should)
never (in request handling) change directories.

> Path: [..]
> Old : [ASSERT ERROR!!!!!!!]
> 
> These are just a few of the cases that just
> assert with the current routine.  The list

Yea, well, 90% of the asserts in the win32 code should be shot in the
head.

> Knowing that the Unix version of 
> ap_os_canonical_filename is a no-op means that
> there is lots of code in the server to do basic
> path crunching (i.e. '..' and '.' and '///' etc).

Right, although you can think of that as not really being filesystem path
but pseudo HTTP stuff.

> 1. Change '\' to '/'
> 2. Eliminate trailing '.' characters.  I felt this was
>    important as a security issue since Win32 will ignore
>    a trailing '.'.

What about spaces?

> 3. Eliminate directories with more than 2 '.'s.  While on
>    Unix it is valid to create a directory '...', it is 
>    not on Win32.  This is a very suspect directory, and
>    therefore, as a security issue, I've opted to eliminate
>    it.  This could be up for discussion.

We already deal with that, right?  Probably the same code that is
stripping "..".  This comes up, of course, because in some situations on
Win32, "..." can actually refer to the parent or the parent of the parent.

> --- http_request.c.first	Tue Oct 06 19:12:24 1998
> +++ http_request.c	Tue Oct 13 21:57:14 1998
> @@ -359,16 +359,19 @@
>          return OK;
>      }
>  
> -    r->filename   = ap_os_canonical_filename(r->pool, r->filename);
> -    test_filename = ap_pstrdup(r->pool, r->filename);
> -
> -    ap_no2slash(test_filename);
> -    num_dirs = ap_count_dirs(test_filename);
> +    r->filename   = ap_os_case_canonical_filename(r->pool, r->filename);
>  
>      res = get_path_info(r);
>      if (res != OK) {
>          return res;
>      }
> +
> +    r->filename   = ap_os_canonical_filename(r->pool, r->filename);
> +
> +    test_filename = ap_pstrdup(r->pool, r->filename);
> +
> +    ap_no2slash(test_filename);
> +    num_dirs = ap_count_dirs(test_filename);
>  
>      if ((res = check_safe_file(r))) {
>          return res;

I haven't yet gone through this, but the idea sounds good.  I will have to
look at exactly what is being done, since it is not intuitively obvious
that the required places will have the required case, but..