You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Francis Daly <de...@daoine.org> on 2001/04/05 22:29:27 UTC

[PATCH] mod_negotiation and order of suffixes

Hi there,

for your consideration, appended to this mail is a patch to remove
the requirements on the order of suffixes when using MultiViews /
mod_negotiation.  This corresponds to (part of) PR3430.

The patch is relative to the version of mod_negotiation.c distributed
with apache-2.0.15.  I believe that's identical to the one from
apache-2.0.16.

But first, some notes:

The current method takes the "file" part of r->filename (either the bit
after the final / in the URL, or the value of DirectoryIndex).  First,
if the exact filename matches, mod_negotiation declines to handle it.
Second, for each file in the directory, it checks for (regex syntax)
/^file\./, and only considers ones that match.

This patched method, for the requested file "file" does the same thing
after a single extra if(strchr()).

However, if the r->filename is actually "file.s1.s2.sZ" (with dots), the
current way looks for /^file\.s1\.s2\.sZ\./; the patched way looks for
each of /^file\./, /\.s1/, /\.s2/, /\.sZ/.  It bails out at the first
failure.

In this case, the patched code does an extra strchr, strlen, strstr,
some pointer arithmetic and pointer movement, changes a character, and
changes it back.  Per dot in r->filename, per file in the directory.  I
don't have numbers for how expensive that extra string manipulation is.

Some consequences of this implementation are:

Current method: file "name.en.html" is only accessible through (partial)
URIs "name", "name.en", or "name.en.html"

Patched method: The same three work, as do "name.html" and
"name.html.en".  That's good.  However: so do "name.htm", "name.htm.en",
and "name.en.htm".  That may or may not be considered good.  More however:
so do "name.h", "name.h.h", "name...h.e.e..e.h.h.", and an infinite
number of similar variations.  That may not be considered good.

[ side note -- most of that infinitude could be eliminated, if desired,
by (for example) checking that the length of r->filename (prefix_len, in
the code) is not more than the length of dirent.name, immediately before
the while loop which looks for dots in filp.  I would consider that an
enhancement to, rather than an integral part of, the patch, so didn't
include it here.  Opinions may differ ]

In each case, the content is returned with a Content-Location: header
indicating the canonical filename.

The requirements are (1)r->filename up to the first dot must match the
real filename up to the first dot; (2)each .suffix in r->filename must
exist (string match) in the real filename; (3)the real filename must
correspond to a known mime-type, encoding, etc -- which I think means
that the final suffix must be known, and only suffixes followed by known
suffixes are considered.

As a real example, testing with the apache "It worked!" page (named
index.html.LANG), if I request index.html.fr, I get the page back.
If I request index.fr.html, or just index.fr, I get back the 406 Not
Acceptable page, with a link to index.html.fr, _unless_ I include fr
as an acceptable language.  That's PR6282, which is mentioned but not
addressed in this patch.  If I include fr as a language, I can request
/index.fr, /index.fr.html, or /index.html.fr successfully.  If I include
fr as my preferred language, I can additionally request /, /index, and
/index.html.  (As well as the .h, .ht, .htm, .f variants referred to
earlier).  If I request /index.d, I get a 406 with links to index.html.de
and index.html.dk

As a faked example, consider five files in the DocumentRoot, with no
special customisations to the (MIME) configuration:

files a.b.c, d.e.html, g.h.i.j.k.en, m.n.o.p.q.html, s.t.html.u.v

The following requests have the indicated results:

GET /a            -> not found
GET /a.b          -> not found
GET /a.c          -> not found
GET /a.b.c        -> success
GET /d            -> success
GET /d.e          -> success
GET /d.h          -> success
GET /d.html       -> success
GET /d....html    -> success
GET /g            -> not found
GET /g.h          -> not found
GET /g.h.i.j.k    -> not found
GET /g.h.i.j.k.en -> success
GET /g.h.i.k.j.en -> not found
GET /m            -> success
GET /m.html       -> success
GET /m.o.q.p.n    -> success
GET /m.o.r.p.n    -> not found
GET /s.t.html.u.v -> success
GET /s            -> not found
GET /s.t.html.u   -> not found

note that in the "not found" cases there (except for /m.o.r.p.n), the
patched code does pass the file down as being potentially valid --
it's later code which decides that it doesn't know how to treat the
final suffix, and fails it.

As another faked example, with files ..d.f.html and .e.txt, I can
successfully issue GETs for /.d, /.f, /.h, /.e and /.t, as well as
things like /....t. (whether or not the final . there is punctuation). 

So that's it.  Any comments, criticism, or ridicule related to the
patch, please send my way, or to the list.

All the best,

	f
-- 
Francis Daly        deva@daoine.org


--- modules/mappers/mod_negotiation.c.orig	Wed Apr  4 18:59:20 2001
+++ modules/mappers/mod_negotiation.c	Thu Apr  5 20:51:13 2001
@@ -911,6 +911,9 @@
     struct var_rec mime_info;
     struct accept_rec accept_info;
     void *new_var;
+    char *pos;
+    int pos_len;
+    int not_this_dirent;
 
     clean_var_rec(&mime_info);
 
@@ -935,12 +938,76 @@
         request_rec *sub_req;
         
         /* Do we have a match? */
-        if (strncmp(dirent.name, filp, prefix_len)) {
-            continue;
-        }
-        if (dirent.name[prefix_len] != '.') {
-            continue;
+
+        if ((pos = strchr(filp, '.'))) {
+
+        /* Given "name.suf1.suf2.suffix", check for "name." */
+
+            pos_len = pos - filp + 1;
+            if (strncmp(dirent.name, filp, pos_len)) {
+                continue;
+            }
+
+            not_this_dirent = 0;
+            filp = ++pos;
+
+        /* Check for each internal ".sufN" from r->filename */
+            while ((pos = strchr(filp, '.'))) {
+                --filp;
+                pos_len = pos - filp ;
+                filp[pos_len]='\0';
+                if (!strstr(dirent.name, filp)) {
+                    not_this_dirent=1;
+                }
+
+        /* XXX: Right now, filp points to a suffix (encoding indicator,
+         * handler indicator, mime-type indicator, whatever), starting
+         * with a ".". If we want to do stuff, like consider that to be
+         * an implicit additional Accept: header, here would be a good
+         * place to do it.  See PR6282 for an example of what I mean.
+         * Note, this would have to be repeated once more, just after the
+         * check for the final ".suffix" and before filp gets moved back
+         * again. 
+         */
+
+                filp[pos_len] = '.';
+                filp += pos_len + 1;
+                
+                if (not_this_dirent) {
+                    /* get to next dirent */
+                    break;
+                }
+            }
+            if (not_this_dirent) {
+                /* reset filp */
+                pos_len = strlen(filp);
+                filp -= prefix_len - pos_len;
+                /* next dirent */
+                continue;
+            }
+            --filp;
+            pos_len = strlen(filp);
+
+        /* Check for the final ".suffix" from r->filename */
+            if (!strstr(dirent.name, filp)) {
+                filp -= prefix_len - pos_len;
+                continue;
+            }
+            filp -= prefix_len - pos_len;
+
+        } else {
+
+        /* Alternatively, given just "name", check for "name." 
+         * Just like it used to be  
+         */
+            if (strncmp(dirent.name, filp, prefix_len)) {
+                continue;
+            }
+            if (dirent.name[prefix_len] != '.') {
+                continue;
+            }
         }
+
 
         /* Yep.  See if it's something which we have access to, and 
          * which has a known type and encoding (as opposed to something

Re: [PATCH] mod_negotiation and order of suffixes

Posted by Francis Daly <de...@daoine.org>.
On Thu, Apr 05, 2001 at 09:29:27PM +0100, Francis Daly wrote:

Replying to my own post, to add to it before I go away for the
weekend...

> The patch is relative to the version of mod_negotiation.c distributed
> with apache-2.0.15.  I believe that's identical to the one from
> apache-2.0.16.

There are a few enhancements to this patch, which relate to optimising
for the normal case.  I've attached three individual patches to it, as
described below -- attached, because I don't see how to cleanly append
them while keeping them distinct.  Hopefully that won't break for
anyone.

The first moves a constant calculation out of a loop -- at the cost of
one int and one strchr() in each request, it saves one strchr() per
file in the directory, if the request contains no dots.

The second requires the first to have been applied.  It moves another
constant calculation out of a loop -- at the cost of a further
strchr() per request where the request contains at least one dot, it
saves one strchr() per file, if the request contains exactly one dot.

Without having numbers to back me up, I'd dare suggest that the normal
case for mod_negotiation and faked type-maps involves requests with 0
or 1 dots (index or index.html, rather than index.html.en) in
directories containing enough files to make both patches worthwhile.

If that's not the normal case, then either or both patches can be
ignored.

The first patch is relative to the distributed mod_negotiation.c from
apache-2.0.15, patched with the "order" diff from
<20...@kerna.ie>.  The second patch is relative to the
result of the first patch.  In fact, the second is a diff -u -b,
because the main change is surrounding a while loop with an if
statement, and consequently indenting the loop.  I presume it's clearer
to use the -b here.

The third patch is independent of the first two -- it implements the
side note from the original mail.  The effect of this is to limit the
number of different URLs which will return the same Content-Location,
and also prevent the server working too hard when a malicious request
with a thousand trailing dots is received.

As provided here, it is a diff against the same code as the first
patch; it can be applied equally to the output of the first or second
patch, modulo a warning about line offsets.

One thing I'm not certain about, is where precisely the three lines of
the third patch should go -- depending on the common requests and
directories, it may be more efficient overall to have it on either
side of one of the if(dots_in_request) tests.


Each of those are changes to the original patch.  If preferred, I can
provide a single patch against the distributed mod_negotiation.c which
includes all of the changes.

Oh, and having thought about it properly, it seems clear that the
mention of PR6282 was in the wrong place -- if anywhere, before the
while loop reading into &dirent would be the place to take care of it.
But that's a completely separate problem.

Finally, a coding question, if I may:

is there a reason (stylistic or otherwise), when given

    int var = 2; /* i.e., neither 0 nor 1 */

to prefer one of

    --------------------
    if (var) 
        ; /* var != 0 */
    else
        ; /* var == 0 */
    --------------------
    if (!var)
        ; /* var == 0 */
    else
        ; /* var != 0 */
    --------------------
    if (!!var) 
        ; /* var != 0 */
    else
        ; /* var == 0 */
    --------------------

over the others?

For a real example, see the second diff below and dots_in_request.
I've used the first method, but don't know enough about portability to
be certain that it is as safe as the third method.

Thanks for your time,

	f
-- 
Francis Daly        deva@daoine.org