You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by David Bishop <db...@micron.com> on 2002/05/03 00:49:45 UTC

filehandle caching and nfs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


[I am sending to dev@ as I 1) think that this will require a source-code fix, 
and 2) user@ had no help for me]

I have a problem with our apache webserver (v. 1.3.14), running on solaris 7.
A lot of our directories are auto-nfs mounted (esp. the ~username stuff).
95% of the time it works great, however, intermittently, it will return "no
such file or directory", for files that were there 5 minutes before, and
still are there if you rsh to the webserver and look at the filesystem.  And,
most importantly, refreshing the page "fixes" it (i.e., returns the correct
page).

My theory for what's happening is that the automounter daemon unmount the fs
after 5 minutes of inactivity, but that apache "caches" the filehandle that
it used the last time.  Then, when you go to hit that page again, it looks at
"/amd/u2pesfs2/blah/foo", rather than "/u/blah/foo", which (obviously)
doesn't work as the /amd/... has been unmounted, and isn't automatically
remounted just by referencing it (as opposed to /u/...).  It then returns a
404, and flushed the fh out of it's cache. Thus, the next time you request
the page, it goes for the /u/blah/foo, the fs is automounted again, and
everything is fine.

Now, the question is, how do I fix this?  It appears that apache does flush
the filehandles out of it's cache after a set amount of time, or we would run
into this problem a lot more often.  How do I adjust that amount to be less
than 5 minutes? Or, alternatively, change it so that if it has cached the fh,
and gets a "no such file", that it "starts from the beginning", and tries it
uncached, before returning the 404?

Any help would be greatly appreciated, and many thanks for a fine webserver.

- -- 
dbishop@micron.com
"Microsoft is a cross between The Borg and the Ferengi. Unfortunately 
they use Borg to do their marketing and Ferengi to do their programming."
								Simon Oke
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE80cKJ14eKrYdPXKkRAgdfAJ47CLMwgGtBMHkl+2/IS6Mj9FumkQCfXbmS
iROEQ55Q4kpTmfbMvwsCHLU=
=hx5K
-----END PGP SIGNATURE-----


Re: filehandle caching and nfs

Posted by Tony Finch <do...@dotat.at>.
On Thu, May 02, 2002 at 04:49:45PM -0600, David Bishop wrote:
> 
> I have a problem with our apache webserver (v. 1.3.14), running on solaris 7.
> A lot of our directories are auto-nfs mounted (esp. the ~username stuff).
> 95% of the time it works great, however, intermittently, it will return "no
> such file or directory", for files that were there 5 minutes before, and
> still are there if you rsh to the webserver and look at the filesystem.  And,
> most importantly, refreshing the page "fixes" it (i.e., returns the correct
> page).
> 
> My theory for what's happening is that the automounter daemon unmount the fs
> after 5 minutes of inactivity, but that apache "caches" the filehandle that
> it used the last time.  Then, when you go to hit that page again, it looks at
> "/amd/u2pesfs2/blah/foo", rather than "/u/blah/foo", which (obviously)
> doesn't work as the /amd/... has been unmounted, and isn't automatically
> remounted just by referencing it (as opposed to /u/...).  It then returns a
> 404, and flushed the fh out of it's cache. Thus, the next time you request
> the page, it goes for the /u/blah/foo, the fs is automounted again, and
> everything is fine.

I have seen this problem before on a Solaris system that was doing large
scale virtual hosting, and the mapping from virtual hosts to physical
directories was done via the automounter (rather than via symlinks as one
would do on other unices). The problem is nothing to with Apache itself,
but Apache exposes a performance problem in the Solaris auto mounter.
The problem became more noticable as the load on the system increased.

In the typical case path lookups that go through an automounted directory
hit a cache in the kernel, and it happens that this cache lookup is faster
- O(1) - than a UFS directory lookup - O(N) - for large directories,
hence using it for large-scale vhosting. When the cache lookup fails,
an upcall is made to the userland automountd, which does a lookup in the
automount tables -- which in my case were large text files, and in your
case may be the password file or NIS maps. The upcall to the automountd
is expensive, and the automountd only handles one upcall at a time.

The problem is that the kernel does not do negative cacheing of automount
lookups, so if you repeatedly request a missing file in an automounted
directory you can overload the automountd. In this situation, you don't
block waiting for the automountd, you get an error return. I can't
remember exactly the errno value, but the result was a "403 Forbidden"
from Apache. This might have changed in more recent versions of Solaris --
we were running 2.6, and ISTR reporting the problem to Sun but before
they got anywhere near a fix (which took far longer than we could wait)
I had an adequate work-around so I didn't pursue a proper solution. They
might have just changed the errno value to ENOENT...

The feature of Apache that caused this overloading of the automountd
is that it looked for .htaccess in the automounted directory on every
request, which caused a cache miss and a consequent upcall to the
automountd on every request. You can spot that this is happening if
the automountd is using an implausible proportion of the CPU. In our
case the automount maps were text files which were loaded and parsed
for every automountd request, which is what caused the unnatural CPU
usage and time delay on automountd requests; if you are using a different
source for automount maps (particularly NIS) then your failure mode will
have different details.

The solution is either to alter your Apache configuration so that it
doesn't look for .htaccess files in the automounted directory, e.g.
(I think -- try it and see) instead of

	<Directory /home/>
		AllowOverride all
	</Directory>

(which looks for /home/.htaccess on every request) do

	<Directory /home/*/>
		AllowOverride all
	</Directory>

Alternatively (if you have the right kind of automounter configuration)
add a bogus .htaccess entry in your automount maps which causes the
kernel cache lookup to succeed with a reference to an empty file or
something equally inoccuous.

Pointing strace at the automountd will tell you if my guess is right.
If it isn't, at least I got to tell an amusing war story.

Tony.
-- 
f.a.n.finch <do...@dotat.at> http://dotat.at/
BAILEY: VARIABLE BECOMING SOUTHEASTERLY 3 OR 4, INCREASING 5 OR 6 IN WEST
LATER. SHOWERS. GOOD.