You are viewing a plain text version of this content. The canonical link for it is here.
Posted to bugs@httpd.apache.org by bu...@apache.org on 2007/10/05 18:00:10 UTC

DO NOT REPLY [Bug 43562] New: - PR 21059 is not needed anymore and has degraded the use of mod_speling; please suprress

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=43562>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=43562

           Summary: PR 21059 is not needed anymore and has degraded the use
                    of mod_speling; please suprress
           Product: Apache httpd-2
           Version: 2.2.6
          Platform: All
        OS/Version: Linux
            Status: NEW
          Severity: major
          Priority: P2
         Component: mod_speling
        AssignedTo: bugs@httpd.apache.org
        ReportedBy: jose@w3.org


Please remove PR 21059 from mod_speling as it has downgraded the
behavior of Apache 2.2 and also because bug 21059 was solved by
changes elsewhere in the code. The patch I'm submitting effectively
deletes PR 21059.

Detailed report:

Way back in 2003, the following dependency was added to
mod_speling:check_speling(), as an attempt to fix bug report # 21059
[1] for Apache 2.0.45. This patch was apparently applied only to the
2.2 tree:

[[
    /* we default to reject path info (same as core handler) */
    if ((r->used_path_info != AP_REQ_ACCEPT_PATH_INFO) &&
        r->path_info && *r->path_info) {
        return DECLINED;
    }
]]

This patch has had a bad side effect on Apache 2.2: use of mod_speling
is now conditioned to the use of AcceptPathInfo; it's not possible
anymore to use mod_speling without having AcceptPathInfo turned on.

With all due respect, I believe that the correct approach would have
rather been to add a directive similar to mod_mime's
ModMimeUsePathInfo [2], which is "is used to combine the filename with
the path_info URL component to apply mod_mime's directives to the
request." This requires a deeper code change than what the patch
proposed.

This patch has also downgraded the behavior of Apache 2.2; if you were
using mod_speling, URIs that returned status 404 under Apache 1.3 are
now returning status 200, because you need to enable AcceptPathInfo to
be able to use mod_speling.

If we look at bug report 21059, I see that it is describing a standard
behavior in Apache 1.3:

  1. the client passed a path_info in its request
      http://www.abcdatos.com/programas/traductores/
  2. mod_negotiation (and not mod_speling), converted the URI to:
      http://www.abcdatos.com/programas/traductores/"

The real bug here, according to the report, is that the server returns
404, regardless of the user's enabling of AcceptPathInfo. Up to this
moment, mod_speling has not been concerned: there has been no spelling
correction.

Let's jump  back to the future and to Apache 2.2. I made the following
tests (detailed results here below):

1. Apache 1.3 with mod_speling
2. Apache 2.2.4; mod_speling with PR 21059 and AcceptPathInfo enabled
3. Apache 2.2.4; mod_speling without PR 21059 removed and AcceptPathInfo 
disabled
4. Apache 2.2.4; mod_sepling witout PR 21059 and AcceptPathInfo enabled

To summarize the results, removing the patch to mod_speling and
disabling AcceptPathInfo gives us the same behavior as with Apache
1.3. Moreover, removing the patch and enabling AcceptPathInfo does
pass the path info to the resource, thus showing that the behavior
described in bug report 21059 was fixed elsewhere in the code and that
PR 21059 is not needed.

There have been other bug reports about the side-effects and nuisance
of binding mod_speling with a global AcceptPathInfo (e.g, # 38635 [3]
and mailing list posts)).

I would strongly advise to remove this dependency and, when resources
are available, add a better integration of AcceptPathInfo processing
into mod_speling, such as the one done in mod_mime via
ModMimeUsePathInfo.

Cheers,

-jose

[1] http://issues.apache.org/bugzilla/show_bug.cgi?id=21059
[2] http://httpd.apache.org/docs/2.2/mod/mod_mime.html#modmimeusepathinfo
[3] http://issues.apache.org/bugzilla/show_bug.cgi?id=38635

---------------
detailed tests

We're requesting http://www.w3.org/INSTALL.html, using different versions
of the Apache server and different variations on the URI.

--------------

1. Apache 1.3.37 (gives us our "standard" behavior)

*** Without '/' after the URI:

http://www.w3.org/INSTALL.html  --> 200  OK

http://www.w3.org/INSTALL  --> 200 OK 
  Content-Location: INSTALL.html

http://www.w3.org/Install --> 300 Multiple Choices (INSTALL.html)
http://www.w3.org/Install.html --> 301 Moved Permanently (INSTALL.html)
http://www.w3.org/Install.HTML --> 301 Moved Permanently (INSTALL.html)

** With '/' after the URI: 
  --> Content-Location header is not sent anymore

http://www.w3.org/INSTALL/ --> 404 Not Found
http://www.w3.org/INSTALL.html/ --> 404 Not Found
http://www.w3.org/Install/ --> 300 Multiple Choices (/INSTALL.html/)
http://www.w3.org/Install.html/ --> 301 Moved Permanently (INSTALL.html/)

  --> Conclusion: They will all end returning  404

** With '/foo' after the URI:

http://www.w3.org/INSTALL.html/foo --> 404 Not Found
http://www.w3.org/INSTALL/foo --> 404 Not Found
   --> No redirection
http://www.w3.org/Install.html/foo --> 301 Moved Permanently (INSTALL.html/foo)
http://www.w3.org/Install/foo --> 300 Multiple Choices (INSTALL.html/foo)

   --> Conclusion: They will all end returning  404

Summary: 

Only INSTALL returns valid content in the end; all the other
variations, INSTALL/ or INSTALL/foo, return 404.

==============

2. Apache 2.2.4 with AcceptPathInfo enabled; mod_speling contains PR 21059

*** Without '/' after the URI:

http://www.w3.org/INSTALL.html  --> 200  OK

http://www.w3.org/INSTALL  --> 200 OK 
  Content-Location: INSTALL.html

http://www.w3.org/Install --> 300 Multiple Choices (INSTALL.html)
http://www.w3.org/Install.html --> 301 Moved Permanently (INSTALL.html)
http://www.w3.org/Install.HTML --> 301 Moved Permanently (INSTALL.html)

** With '/' after the URI: 

http://www.w3.org/INSTALL/ --> 200 OK
http://www.w3.org/INSTALL.html/ --> 200 OK
http://www.w3.org/Install/ --> 300 Multiple Choices (/INSTALL.html/)
http://www.w3.org/Install.html/ --> 301 Moved Permanently (INSTALL.html/)

** With '/foo' after the URI:

http://www.w3.org/INSTALL/foo --> 200 OK
http://www.w3.org/INSTALL.html/foo --> 200 OK
http://www.w3.org/Install/foo --> 300 Multiple Choices (/INSTALL.html/foo)
http://www.w3.org/Install.html/foo --> 301 Moved Permanently (INSTALL.html/foo)

Summary: 

The server always return status code 200 if we want to use mod_speling,
because of the biding with AcceptPathInfo on.

There's no way to have the same behavior as under Apache 1.3.

=======

3. Apache 2.2.4 with AcceptPathInfo disabled; mod_speling without PR 21059

*** Without '/' after the URI:

http://www.w3.org/INSTALL.html  --> 200  OK

http://www.w3.org/INSTALL  --> 200 OK 
  Content-Location: INSTALL.html

http://www.w3.org/Install --> 300 Multiple Choices (INSTALL.html)
http://www.w3.org/Install.html --> 301 Moved Permanently (INSTALL.html)
http://www.w3.org/Install.HTML --> 301 Moved Permanently (INSTALL.html)

** With '/' after the URI: 

http://www.w3.org/INSTALL/ --> 404 Not Found
http://www.w3.org/INSTALL.html/ --> 404 Not Found
http://www.w3.org/Install/ --> 300 Multiple Choices (/INSTALL.html/)
http://www.w3.org/Install.html/ --> 301 Moved Permanently (INSTALL.html/)

** With '/foo' after the URI:

http://www.w3.org/INSTALL.html/foo --> 404 Not Found
http://www.w3.org/INSTALL/foo --> 404 Not Found
http://www.w3.org/Install.html/foo --> 301 Moved Permanently (INSTALL.html/foo)
http://www.w3.org/Install/foo --> 300 Multiple Choices (INSTALL.html/foo)

Conclusion:

Removing PR 21059 and disabling AcceptPathInfo gives us the same
behavior as under Apache 1.3. mod_speling is not conditioned anymore
to the AcceptPathInfo directive.

==========

4. Apache 2.2.4 with AcceptPathInfo enabled; mod_speling without PR 21059

*** Without '/' after the URI:

http://www.w3.org/INSTALL.html  --> 200  OK

http://www.w3.org/INSTALL  --> 200 OK 
  Content-Location: INSTALL.html

http://www.w3.org/Install --> 300 Multiple Choices (INSTALL.html)
http://www.w3.org/Install.html --> 301 Moved Permanently (INSTALL.html)
http://www.w3.org/Install.HTML --> 301 Moved Permanently (INSTALL.html)

** With '/' after the URI: 

http://www.w3.org/INSTALL/ --> 404 Not Found
http://www.w3.org/INSTALL.html/ --> 404 Not Found
http://www.w3.org/Install/ --> 300 Multiple Choices (/INSTALL.html/)
http://www.w3.org/Install.html/ --> 301 Moved Permanently (INSTALL.html/)

** With '/foo' after the URI:

http://www.w3.org/INSTALL.html/foo --> 200 OK
http://www.w3.org/INSTALL/foo --> 200 OK
http://www.w3.org/Install.html/foo --> 301 Moved Permanently (INSTALL.html/foo)
http://www.w3.org/Install/foo --> 300 Multiple Choices (INSTALL.html/foo)

Conclusion:

Removing PR 21059 and enabling AcceptPathInfo gives us the same
behavior as under Apache 1.3, with the addition of being able to pass
on the path info to the resource. This shows effectively that bug
#21059 was solved elsewhere in the code and that PR 21059 in
mod_speling is not related at all.

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 43562] - PR 21059 is not needed anymore and has degraded the use of mod_speling; please suprress

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=43562>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=43562





------- Additional Comments From jose@w3.org  2007-10-05 09:01 -------
Created an attachment (id=20923)
 --> (http://issues.apache.org/bugzilla/attachment.cgi?id=20923&action=view)
removes PR 21059 from mod_speling. Patch against 2.2.6


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 43562] - PR 21059 is not needed anymore and has degraded the use of mod_speling; please suprress

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=43562>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=43562





------- Additional Comments From jose@w3.org  2007-10-05 09:01 -------
Created an attachment (id=20924)
 --> (http://issues.apache.org/bugzilla/attachment.cgi?id=20924&action=view)
removes PR 21059 from mod_speling. Patch against trunk


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 43562] - PR 21059 is not needed anymore and has degraded the use of mod_speling; please suprress

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=43562>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=43562


jose@w3.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |PatchAvailable




-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 43562] - PR 21059 is not needed anymore and has degraded the use of mod_speling; please suprress

Posted by bu...@apache.org.
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG�
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=43562>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND�
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=43562





------- Additional Comments From ted@w3.org  2007-10-05 09:48 -------
One thing to add to this report is a rather serious side effect this
mod_speling/pathinfo bug inadvertently introduces, infinitely recursive uris. 
Clueless crawlers, which are many, do not notice from the ETAG that they are
getting is identical to a resource they already indexed.  When parsing a
resource for links they accessed with a trailing / and coming across a relative
uri they naturally append the relative path to the uri they already have as a
resource to crawl.

For example:

http://www.w3.org/INSTALL.html/Library/src/Library/src/ --> 200 OK

http://www.w3.org/INSTALL.html/Library/src/Library/src/Library/src/ --> 200 OK

ad nauseam

Except as perhaps a teergrube exercise to trap and hold hostage unsuspecting
crawlers indefinitely this is generally undesirable.

Regards,

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org