You are viewing a plain text version of this content. The canonical link for it is here.

Posted to test-dev@httpd.apache.org by Rodent of Unusual Size <Ke...@Golux.Com> on 2001/11/09 12:10:32 UTC

Re: [franklin_tech_bulletins@yahoo.com: IBM AS/400 HTTP Server '/' attack]

john sachs wrote:
> 
> anyway, in doing so, i noticed that 1.3 serves the page
> as you'd expect.  in 2.0, you get 404.  which is "correct"?

404 is most definitely not correct.  Adding a '/', optionally
followed by more data, to the end of a mapped filename is
perfectly valid and defines the 'path-info'.  1.3 seems to be
correctly differentiating between the resource and its path-info;
2.0 is probably trying to treat the whole thing as a resource
and hence not finding it.

Code for a 200 return, and a response body that matches the
document's correctly-rendered (as opposed to raw) content.
-- 
#ken	P-)}

Ken Coar, Sanagendamgagwedweinini  http://Golux.Com/coar/
Author, developer, opinionist      http://Apache-Server.Com/

"All right everyone!  Step away from the glowing hamburger!"

Re: [franklin_tech_bulletins@yahoo.com: IBM AS/400 HTTP Server '/' attack]

Posted by john sachs <js...@covalent.net>.

right, so i'll hold off on writing the test for the moment.  once this gets all hashed out, if someone could just post what the correct expected results for

GET /index.html/
GET /index.shtml/
GET /index.cgi/

and if the result would vary for something like

GET /index.html/more/stuff

post what the expected behavior should be for that test too.
thanks.
-j

for what its worth, i thought GET /index.html/ should be 404 because it would seem the server would want to translate that URI to <DocumentRoot>/index.html/<DirectoryIndex>.  and since there is no directory called <DocumentRoot>/index.html, the resource would not be found.  make sense?

Re: [franklin_tech_bulletins@yahoo.com: IBM AS/400 HTTP Server '/' attack]

Posted by "Roy T. Fielding" <fi...@ebuilt.com>.

On Fri, Nov 09, 2001 at 03:16:03PM -0500, Rodent of Unusual Size wrote:
> Rodent of Unusual Size wrote:
> > 
> > "William A. Rowe, Jr." wrote:
> > >
> > > Since the default handler doesn't glom onto trailing path_info,
> > > it doesn't get handled.  All 1.3 pages I tested _without_ SSI's
> > > enabled returned 404.
> > 
> > Irrelevant.  It is perfectly possible and acceptable to submit
> > path-info to a resource that cannot handle it (e.g., a text/plain
> > file).  If we return 404 for that, we are broken -- period.
> 
> Nrrgh, I take that back, at least temporarily.  Objections
> currently in abeyance while I check some stuff.. :-/

Good, because I was about to slap you silly.  ;-)

One of the goals of an HTTP server is to give the server owner control
over their http namespace.  Allowing every storage component to map into
an infinite number of URI is not desirable.  It isn't even desirable for
gateways like CGI, SSI, and JSP, though it is necessary in order for those
storage components to become dynamic handlers of collection resources.
Extra path info is therefore only valid for those components for which
handlers are defined that are capable of being collections (like mod_index)
or being a potential gateway to a collection.

This would be a lot easier for the server to determine if we had some sort
of config or module rec field that indicated the handler wants path info.
The default handler definitely does not want path info.

....Roy

Re: [franklin_tech_bulletins@yahoo.com: IBM AS/400 HTTP Server '/' attack]

Posted by Rodent of Unusual Size <Ke...@Golux.Com>.

Rodent of Unusual Size wrote:
> 
> "William A. Rowe, Jr." wrote:
> >
> > Since the default handler doesn't glom onto trailing path_info,
> > it doesn't get handled.  All 1.3 pages I tested _without_ SSI's
> > enabled returned 404.
> 
> Irrelevant.  It is perfectly possible and acceptable to submit
> path-info to a resource that cannot handle it (e.g., a text/plain
> file).  If we return 404 for that, we are broken -- period.

Nrrgh, I take that back, at least temporarily.  Objections
currently in abeyance while I check some stuff.. :-/
-- 
#ken	P-)}

Ken Coar, Sanagendamgagwedweinini  http://Golux.Com/coar/
Author, developer, opinionist      http://Apache-Server.Com/

"All right everyone!  Step away from the glowing hamburger!"

Re: [franklin_tech_bulletins@yahoo.com: IBM AS/400 HTTP Server '/' attack]

Posted by Rodent of Unusual Size <Ke...@Golux.Com>.

"William A. Rowe, Jr." wrote:
> 
> Yes, and no.
> 
> Since the default handler doesn't glom onto trailing path_info,
> it doesn't get handled.  All 1.3 pages I tested _without_ SSI's
> enabled returned 404.

Irrelevant.  It is perfectly possible and acceptable to submit
path-info to a resource that cannot handle it (e.g., a text/plain
file).  If we return 404 for that, we are broken -- period.

> > Code for a 200 return, and a response body that matches the
> > document's correctly-rendered (as opposed to raw) content.
> 
> That would be a good convention, against an SSI page.

No, it is the correct answer, not a convention.

> The real issue is ending up with hundreds of robot hits (or goofy
> caching state) against a site with an infinite number of pages...

No, that is not the real issue.  The real issue is that we
are lying when we say 'not found'.  The robot hits issue is
separate and something of which to be aware and for which to
plan, but it does *not* dictate the correct behaviour of
the server.
-- 
#ken	P-)}

Ken Coar, Sanagendamgagwedweinini  http://Golux.Com/coar/
Author, developer, opinionist      http://Apache-Server.Com/

"All right everyone!  Step away from the glowing hamburger!"

Re: [franklin_tech_bulletins@yahoo.com: IBM AS/400 HTTP Server '/' attack]

Posted by "William A. Rowe, Jr." <wr...@covalent.net>.

From: "Roy T. Fielding" <fi...@ebuilt.com>
Sent: Friday, November 09, 2001 2:19 PM


> > Since SSI is another beast, it accepts path_info and serves the
> > page.
> 
> Yes, though I wish I could find a way to prevent if from doing so
> if it did not expect path_info.

I've been thinking the same thing... same with CGI.  It would be great if there
were a way to 'consume' path_info or else 404.  No trivial solution that I
could come up with.

> > etc.  A possible convention, against the core handler, would be an
> > external redirect back to /index.html to keep all that cruft away.
> 
> No, those should be 404 unless .html is SSI.

As an option, as opposed to a dictate?  Could even provide that shtml's could
leave that unset if they don't intend to use path_info.

> > CGI authors have to deal with this issue in whatever way is appropriate,
> > if they use path_info at all.
> 
> Likewise for JSP.

And every other scripting techology ;)

Re: [franklin_tech_bulletins@yahoo.com: IBM AS/400 HTTP Server '/' attack]

Posted by "Roy T. Fielding" <fi...@ebuilt.com>.

> > john sachs wrote:
> > > 
> > > anyway, in doing so, i noticed that 1.3 serves the page
> > > as you'd expect.  in 2.0, you get 404.  which is "correct"?
> > 
> > 404 is most definitely not correct.  Adding a '/', optionally
> > followed by more data, to the end of a mapped filename is
> > perfectly valid and defines the 'path-info'.  1.3 seems to be
> > correctly differentiating between the resource and its path-info;
> > 2.0 is probably trying to treat the whole thing as a resource
> > and hence not finding it.
> 
> Yes, and no.
> 
> Since the default handler doesn't glom onto trailing path_info,
> it doesn't get handled.  All 1.3 pages I tested _without_ SSI's
> enabled returned 404.

Yep, that's what it is supposed to do.

> Since SSI is another beast, it accepts path_info and serves the
> page.

Yes, though I wish I could find a way to prevent if from doing so
if it did not expect path_info.

> It seems IBM's core handler was tweaked, and in doing so, exposed
> this hole.
> 
> > Code for a 200 return, and a response body that matches the
> > document's correctly-rendered (as opposed to raw) content.
> 
> That would be a good convention, against an SSI page.
> 
> The real issue is ending up with hundreds of robot hits (or goofy
> caching state) against a site with an infinite number of pages...
> 
> /index.html
> /index.html/hello?
> /index.html/are/you/there
> 
> etc.  A possible convention, against the core handler, would be an
> external redirect back to /index.html to keep all that cruft away.

No, those should be 404 unless .html is SSI.

> CGI authors have to deal with this issue in whatever way is appropriate,
> if they use path_info at all.

Likewise for JSP.

....Roy

Re: [franklin_tech_bulletins@yahoo.com: IBM AS/400 HTTP Server '/' attack]

Posted by "William A. Rowe, Jr." <wr...@covalent.net>.

From: "Rodent of Unusual Size" <Ke...@Golux.Com>
Sent: Friday, November 09, 2001 5:10 AM

> john sachs wrote:
> > 
> > anyway, in doing so, i noticed that 1.3 serves the page
> > as you'd expect.  in 2.0, you get 404.  which is "correct"?
> 
> 404 is most definitely not correct.  Adding a '/', optionally
> followed by more data, to the end of a mapped filename is
> perfectly valid and defines the 'path-info'.  1.3 seems to be
> correctly differentiating between the resource and its path-info;
> 2.0 is probably trying to treat the whole thing as a resource
> and hence not finding it.

Yes, and no.

Since the default handler doesn't glom onto trailing path_info,
it doesn't get handled.  All 1.3 pages I tested _without_ SSI's
enabled returned 404.

Since SSI is another beast, it accepts path_info and serves the
page.

It seems IBM's core handler was tweaked, and in doing so, exposed
this hole.

> Code for a 200 return, and a response body that matches the
> document's correctly-rendered (as opposed to raw) content.

That would be a good convention, against an SSI page.

The real issue is ending up with hundreds of robot hits (or goofy
caching state) against a site with an infinite number of pages...

/index.html
/index.html/hello?
/index.html/are/you/there

etc.  A possible convention, against the core handler, would be an
external redirect back to /index.html to keep all that cruft away.

CGI authors have to deal with this issue in whatever way is appropriate,
if they use path_info at all.

Bill