You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Greg Ames <gr...@remulak.net> on 2001/10/04 23:35:31 UTC

infinite recursive subrequests

...well, at least until my ulimit of 1024 open file descriptors kicks
in.

setup: DocumentRoot contains /index.html, mod_negotiation is built in,
Options MultiViews is coded in the config file.  Directory
<doc_root>/index/ does not exist.

URI: /index/garbage/trash (the /trash on the end is probably irrelevant)

We get into read_types_multi() in mod_negotiation's type checker.  It
has r->filename == <doc_root>/index and neg->dir_name == <doc_root> ,
opens up the doc_root directory, and starts reading entries.  The entry
for "index.html" triggers a call to ap_sub_req_lookup_dir_ent(), which
starts the nearly infinite recursion.  The subrequest URI's are
/index/garbage/index.html .  

Thankfully, my ulimit kicks in when mod_negotiation has 1011 open fd's
for the doc_root directory.  Then I get a single 

"(24)Too many open files: cannot read directory for multi: <doc_root>" 

log message, followed by 1011 

"Negotiation: discovered file(s) matching request: <doc_root>/index
(None could be negotiated)"

log messages.

It seems like this code is confused about what the base part of the
desired pathname is, and uses only the part that exists (doc_root) in
the code that opens the directory.  This is wrong.  Since neg->dir_name
isn't the same as the desired pathname up to the last slash, some code
must have already detected a problem, but didn't kill the request.  Or
shouldn't the request die in directory_walk?

Greg

Re: infinite recursive subrequests

Posted by Greg Ames <gr...@remulak.net>.
"William A. Rowe, Jr." wrote:
> 
> This fix to request.c (1.68) definately appears to have broken the server.
> 
> I know you commented it was 'gross', but perhaps we walk the list of
> r->main->uri/filename or r->main->uri/filename on each subrequest...
> if we hit 'ourself' we die.

We could, but that's just a sanity check that shows us we're doing
something wrong elsewhere.
We need to fix the code that's generating and using the bad URIs in the
first place.  Maybe we should just wrap the infinite subrequest
recursion checks in #ifdef AP_DEBUG logic and go for it.  What do folks
think?

> Just a thought.  I'd still like more details.  More inline.

> >
> > When I set up my ThinkPad to replicate the apache.org website, I decided
> > to ignore the bug database at first.  So I simply commented out the
> > bugs.apache.org VirtualHost definitions in my config file, and expect to
> > get errors for the bugs db URLs.  The latest chuck of daedalus's
> > production log I'm using for replay testing had a GET for
> > http://bugs.apache.org/index/full/3807 .  When the vhost lookup fails on
> > my TP, we set r->hostname to dev.apache.org, which is the first vhost.
> > (seems odd, but that's a different issue.)  In dev.apache.org's
> > doc_root, the only thing that matches index* is index.html , and I get
> > the subrequest recursion problem.

> I don't see how index.html then recurses?  Please explain?

ap_sub_req_dirent_lookup will end up generating recursive subrequests
with URIs of /index/full/index.html, when mod_negotiation is trying to
lookup filename <doc_root>/index.html.  Something (directory_walk ??)
trims back r->filename to the valid pieces of the path, but the URI
never gets trimmed to correspond.  That's fine when we just use the
filename, not fine when we use the URI.

> > But on daedalus, /www/bugs.apache.org/ contains index.cgi .  /index from
> > the URI matches the index.cgi file, and the rest of the URI seems to be
> > used as parms for the cgi.  Therefore knowing where the last slash is in
> > the URI is insufficient to determine what's a directory and what's not,
> > and the fuzzy search for index* in doc_root is necessary to locate a
> > potential cgi.
> 
> Could we have a broken PATH_INFO?  Sounds like the problem.  A user reported
> similar on 1.3.22 :(

We could indeed.  I see your patch for r->path_info and will play with
it.

Greg

Re: infinite recursive subrequests

Posted by "William A. Rowe, Jr." <wr...@covalent.net>.
This fix to request.c (1.68) definately appears to have broken the server.

I know you commented it was 'gross', but perhaps we walk the list of
r->main->uri/filename or r->main->uri/filename on each subrequest... 
if we hit 'ourself' we die.

Just a thought.  I'd still like more details.  More inline.

----- Original Message ----- 
From: "Greg Ames" <gr...@remulak.net>
To: <de...@httpd.apache.org>
Sent: Friday, October 05, 2001 11:41 AM
Subject: Re: infinite recursive subrequests


> Greg Ames wrote:
> > 
> > ...well, at least until my ulimit of 1024 open file descriptors kicks
> > in.
> > 
> > setup: DocumentRoot contains /index.html, mod_negotiation is built in,
> > Options MultiViews is coded in the config file.  Directory
> > <doc_root>/index/ does not exist.
> > 
> > URI: /index/garbage/trash (the /trash on the end is probably irrelevant)
> 
> > It seems like this code is confused about what the base part of the
> > desired pathname is, and uses only the part that exists (doc_root) in
> > the code that opens the directory.  This is wrong.  Since neg->dir_name
> > isn't the same as the desired pathname up to the last slash, some code
> > must have already detected a problem, but didn't kill the request.  Or
> > shouldn't the request die in directory_walk?
> 
> uggggh, I'm afraid this situation is not as straight forward as I
> thought when I wrote that.
> 
> When I set up my ThinkPad to replicate the apache.org website, I decided
> to ignore the bug database at first.  So I simply commented out the
> bugs.apache.org VirtualHost definitions in my config file, and expect to
> get errors for the bugs db URLs.  The latest chuck of daedalus's
> production log I'm using for replay testing had a GET for
> http://bugs.apache.org/index/full/3807 .  When the vhost lookup fails on
> my TP, we set r->hostname to dev.apache.org, which is the first vhost. 
> (seems odd, but that's a different issue.)  In dev.apache.org's
> doc_root, the only thing that matches index* is index.html , and I get
> the subrequest recursion problem.

I don't see how index.html then recurses?  Please explain?

> But on daedalus, /www/bugs.apache.org/ contains index.cgi .  /index from
> the URI matches the index.cgi file, and the rest of the URI seems to be
> used as parms for the cgi.  Therefore knowing where the last slash is in
> the URI is insufficient to determine what's a directory and what's not,
> and the fuzzy search for index* in doc_root is necessary to locate a
> potential cgi.

Could we have a broken PATH_INFO?  Sounds like the problem.  A user reported
similar on 1.3.22 :(

> I suppose I could come up with a patch for mod_negotiation to abort the
> request if it's about to run a subrequest which will be identical to the
> current request.  But that feels like an incomplete Band-Aid.  If I had
> both index.html and index.cgi in doc_root (eewww! gross!!  maybe this
> should just be banned), it seems like the .cgi could be a legitimate
> match if it maps to an actual cgi, but the .html could not if it is
> simply a static file.  Therefore, it should be OK to run one
> non-recursive subrequest for index.html, and learn that is just an
> ordinary file which cannot match the original URI, then try index.cgi
> assuming it exists in the directory.

As I suggest, walking our own list of r->prev or r->main would be sufficient.
That way, you could have several redirects, simply not circular.

> Since this is so messy, I'd like to see what happens with 2.0.24 to see
> if this is recent breakage.

Probably.  Before 2.0.25, we had custom code for every internal_redirect and
sub_req_lookup mechanism.  I don't want to return to that state ;)


Re: infinite recursive subrequests

Posted by Greg Ames <gr...@remulak.net>.
Greg Ames wrote:
> 
> ...well, at least until my ulimit of 1024 open file descriptors kicks
> in.
> 
> setup: DocumentRoot contains /index.html, mod_negotiation is built in,
> Options MultiViews is coded in the config file.  Directory
> <doc_root>/index/ does not exist.
> 
> URI: /index/garbage/trash (the /trash on the end is probably irrelevant)

> It seems like this code is confused about what the base part of the
> desired pathname is, and uses only the part that exists (doc_root) in
> the code that opens the directory.  This is wrong.  Since neg->dir_name
> isn't the same as the desired pathname up to the last slash, some code
> must have already detected a problem, but didn't kill the request.  Or
> shouldn't the request die in directory_walk?

uggggh, I'm afraid this situation is not as straight forward as I
thought when I wrote that.

When I set up my ThinkPad to replicate the apache.org website, I decided
to ignore the bug database at first.  So I simply commented out the
bugs.apache.org VirtualHost definitions in my config file, and expect to
get errors for the bugs db URLs.  The latest chuck of daedalus's
production log I'm using for replay testing had a GET for
http://bugs.apache.org/index/full/3807 .  When the vhost lookup fails on
my TP, we set r->hostname to dev.apache.org, which is the first vhost. 
(seems odd, but that's a different issue.)  In dev.apache.org's
doc_root, the only thing that matches index* is index.html , and I get
the subrequest recursion problem.

But on daedalus, /www/bugs.apache.org/ contains index.cgi .  /index from
the URI matches the index.cgi file, and the rest of the URI seems to be
used as parms for the cgi.  Therefore knowing where the last slash is in
the URI is insufficient to determine what's a directory and what's not,
and the fuzzy search for index* in doc_root is necessary to locate a
potential cgi.

I suppose I could come up with a patch for mod_negotiation to abort the
request if it's about to run a subrequest which will be identical to the
current request.  But that feels like an incomplete Band-Aid.  If I had
both index.html and index.cgi in doc_root (eewww! gross!!  maybe this
should just be banned), it seems like the .cgi could be a legitimate
match if it maps to an actual cgi, but the .html could not if it is
simply a static file.  Therefore, it should be OK to run one
non-recursive subrequest for index.html, and learn that is just an
ordinary file which cannot match the original URI, then try index.cgi
assuming it exists in the directory.

Since this is so messy, I'd like to see what happens with 2.0.24 to see
if this is recent breakage.

Greg