You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Jason Hunter <jh...@acm.org> on 2001/09/11 07:07:11 UTC

Re: Tomcat 3.2.3 and getPathInfo

I'd still like to see Tomcat allow the slashes.  Here's my argument:

* Allowing two adjacent slashes to remain is not a security risk
* The Apache Web Server allows the slashes to remain
* Tomcat used to allow the slashes to remain
* Code (like mine) which used to work with Tomcat is now breaking
* It's breaking a book example too, which may cause lots of bug reports
(to both of us)
* Unless the spec says to normalize beyond what's necessary, Tomcat
shouldn't normalize beyond what's necessary

-jh-

Marc Saegesser wrote:
> 
> After looking into this further I've changed my mind.  I've tried this using
> other web servers (iPlanet, IIS 4.0 and 5.0) and in all cases the value in
> PATH_INFO has been fully normalized including removing adjacent /
> characters.  IIS gets the contents of PATH_INFO wrong, but it is fully
> normalized.  The CGI 1.1 specification is silent on this topic (like it is
> on most other important details).
> 
> I think we should leave Tomcat as it currently is in 3.2.3.  If you need to
> pass data to a servlet in the URL and that data *must not* be susceptible to
> URL normalization then the data *must* be in the query string.
> 
> Marc Saegesser
> 
> > -----Original Message-----
> > From: Jason Hunter [mailto:jhunter@acm.org]
> > Sent: Monday, August 27, 2001 8:45 PM
> > To: tomcat-dev@jakarta.apache.org
> > Subject: Re: Tomcat 3.2.3 and getPathInfo
> >
> >
> > Marc Saegesser wrote:
> > >
> > > Using Apache 1.3.19 here's what I see.  Apache does normalize
> > the URL but
> > > there is a small difference between what it does and what Tomcat does.
> > > Apache does not remove multiple adjacent / characters.  For example,
> > >
> > > http://server/cgi-bin/script/fu/bar --> PATH_INFO = /fu/bar
> > > http://server/cgi-bin/script/fu/../bar --> PATH_INFO = /bar
> > > http://server/cgi-bin/script/fu//bar --> PATH_INFO = /fu//bar
> > >
> > > The multiple adjacent / characters don't seem to have any
> > effect on locating
> > > resources.  For example,
> > >
> > > http://server///////cgi-bin/script/fu/bar
> > >
> > > works just fine.  Unless other comitters feel otherwise, I'll work on
> > > changes to the tomcat_32 branch to make path info work as it
> > does with CGI
> > > in Apache.
> >
> > Perfect, then my issue (at least) would be solved.
> >
> > -jh-

RE: Tomcat 3.2.3 and getPathInfo

Posted by Marc Saegesser <ma...@apropos.com>.
1)  A URL identifies a resource.  It doesn't matter whether the resource is
a static file, CGI, servlet, JSP, ASP or anything else, it simply names a
resource.  To say that some pieces of the URL should be normalized while
other pieces should not goes against this concept.  Any two URLS that name
the same resource can be considered identical and can freely be rewritten as
long as they continue to name the same resource.  Browsers do this all the
time and I assume that proxy servers may as well.

2)  The URLs http://server/a/b/../c, http://server/a/./c and
http://server/a/c all obviously refer to the same resource.  Now assume that
/a/c/* is a prefix mapping for a servlet, because all these URLs refer to
the same resource they must all mapp to the servlet.  By normalizing the URL
we don't alter the resource being requested and we can now do the comparison
to test for a matching prefix.  Now assume that the /a/* is a prefix mapping
for a servlet.  The same three URLs still name the same resource.  What
should the servlet expect to see in path info?  I supposed it might expect
any one of /b/../c, /./c or /c, but since those all map to the same thing
the servlet should still return the same result.  If it doesn't then it has
created more than one resource with the same name.  Every server that I
tested returned a path info of /c (the fully normlized version).

3)  The security problem comes not from the servlet prefix mapping but a
related issue of security-constraints specified by a url-pattern.  If /a/b/*
is protected then URLs like http://server/a/./b/c and
http://server/a/c/../b/d both refer to resources withing the protected area
and must not be served unless the user is authenticated.  Normalizing these
URLs make the url-pattern comparison possible.

4)  Now, for the case of two adjacent slash characters.  It comes down to
whether the URLs http://server/a/b/c, http://server/a/b//c and
http://a//b/////c refer to the same resource or not.  Every implementation
I've seen treats an empty hierachy part identical to no hierarchy part.  All
of these URLs refer to the same resource, regardless of whether the resource
comes from a static file or a servlet.  If /a/* is the prefix mapping for a
servlet then, just like in 2) above, servlet a must expect that it can
receive normalized path info and if it doesn't receive one it should
normalize it anyway so that it serves the same resource regardless of how
normalized the initial URL was.  If /a/b/* is a security contraint then all
of these URLs refer to a protected resource.

5)  A servlet or CGI script should not expect to be able to receive
non-normalized information in path info.  URLs are subject to rewritting at
several times:  by the originating browser, intervening proxy servers, http
servers and the servlet container.  If a servlet or CGI needs data that
*must not* be subject to URL normalization then it *must* be passed in the
query string.

6)  This behaviour did change between 3.2.2 and 3.2.3, but this behaviour is
not protected by any specification.  Yes the change can break existing
(non-portable) code and yes, I'm sorry, it does break the code in your book.
But, IMHO, that code is not a good example of the usage of getPathInfo() and
it is not portable.

7)  You are welcome to submit a patch that provides the functionality that
your looking for as long as you can make absolutely certain that no resource
protected by a security contraint can be inadvertantly served by any oddly
formated URL.  I won't commit that patch because I think its a bad idea, but
you might find someone here who will commit it.  Actually I notice that your
listed as a Tomcat committer so you could just do it yourself.


Marc Saegesser

> -----Original Message-----
> From: Jason Hunter [mailto:jhunter@acm.org]
> Sent: Tuesday, September 11, 2001 12:07 AM
> To: tomcat-dev@jakarta.apache.org
> Subject: Re: Tomcat 3.2.3 and getPathInfo
>
>
> I'd still like to see Tomcat allow the slashes.  Here's my argument:
>
> * Allowing two adjacent slashes to remain is not a security risk
> * The Apache Web Server allows the slashes to remain
> * Tomcat used to allow the slashes to remain
> * Code (like mine) which used to work with Tomcat is now breaking
> * It's breaking a book example too, which may cause lots of bug reports
> (to both of us)
> * Unless the spec says to normalize beyond what's necessary, Tomcat
> shouldn't normalize beyond what's necessary
>
> -jh-
>
> Marc Saegesser wrote:
> >
> > After looking into this further I've changed my mind.  I've
> tried this using
> > other web servers (iPlanet, IIS 4.0 and 5.0) and in all cases
> the value in
> > PATH_INFO has been fully normalized including removing adjacent /
> > characters.  IIS gets the contents of PATH_INFO wrong, but it is fully
> > normalized.  The CGI 1.1 specification is silent on this topic
> (like it is
> > on most other important details).
> >
> > I think we should leave Tomcat as it currently is in 3.2.3.  If
> you need to
> > pass data to a servlet in the URL and that data *must not* be
> susceptible to
> > URL normalization then the data *must* be in the query string.
> >
> > Marc Saegesser
> >
> > > -----Original Message-----
> > > From: Jason Hunter [mailto:jhunter@acm.org]
> > > Sent: Monday, August 27, 2001 8:45 PM
> > > To: tomcat-dev@jakarta.apache.org
> > > Subject: Re: Tomcat 3.2.3 and getPathInfo
> > >
> > >
> > > Marc Saegesser wrote:
> > > >
> > > > Using Apache 1.3.19 here's what I see.  Apache does normalize
> > > the URL but
> > > > there is a small difference between what it does and what
> Tomcat does.
> > > > Apache does not remove multiple adjacent / characters.  For example,
> > > >
> > > > http://server/cgi-bin/script/fu/bar --> PATH_INFO = /fu/bar
> > > > http://server/cgi-bin/script/fu/../bar --> PATH_INFO = /bar
> > > > http://server/cgi-bin/script/fu//bar --> PATH_INFO = /fu//bar
> > > >
> > > > The multiple adjacent / characters don't seem to have any
> > > effect on locating
> > > > resources.  For example,
> > > >
> > > > http://server///////cgi-bin/script/fu/bar
> > > >
> > > > works just fine.  Unless other comitters feel otherwise,
> I'll work on
> > > > changes to the tomcat_32 branch to make path info work as it
> > > does with CGI
> > > > in Apache.
> > >
> > > Perfect, then my issue (at least) would be solved.
> > >
> > > -jh-