You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Igor Tatarinov <ta...@prairie.NoDak.edu> on 1997/12/18 21:26:28 UTC

On avoiding stat()

Hi all!

As you might remember, I am trying to add a document cache to apache that
would (hopefully) decrease the number of I/Os and save some CPU cycles by
avoiding open/mmap/close

So I got a question: is there any way to avoid the stat() call. I know that
Apache needs to make it to mmap the file but if I already got the file in
the cache and know its size, there is no reason to stat() the file. Of
course, the file may have been updated but this happens not often and I
could probably check for "If-Modified-Since" and stat the file only if
reload was requested.

any ideas?

thanks,
igor

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Igor Tatarinov, graduate student, Computer Science Dept, NDSU
e-mail: tatarino@prairie.nodak.edu   or   itat@acm.org
http://www.cs.ndsu.nodak.edu/~tatarino
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Re: On avoiding stat()

Posted by Dean Gaudet <dg...@arctic.org>.
How are you indexing your cache?  By file or by URL?

There are things that you can easily break by skipping process_request.
For example:

BrowserMatch BadRobot/0.9 deny
<Directory />
    order allow,deny
    allow from all
    deny from env=deny
</Directory>

or,

BrowserMatch MSIE/4 downgrade-1.0

or any number of things that goes on in process_request().

That said, I know one particular new-httper who short-circuits responses
and avoids some of the phases.  Doing it is not a general purpose thing
though imho.

Dean

On Thu, 18 Dec 1997, Igor Tatarinov wrote:

> No, I don't want to rewrite what has already been written. Here is what
> I would like to do:
> 
> Main rule:
> a file can only enter the cache once it's been thru all Apache normal 
> processing stuff.
> 
> Hence:
> 1) sometime before process_request() I check if the requested file is 
> in my cache (I may have to check a lot of CGI-request names etc.)
> 
> 2) if yes, and the request is not IMS
> I send the file from cache assuming that is no big deal if it's a bit 
> stale. I need to make sure process_request() is not called.
> 
>    if no, I call process_request().
> if the request comes back to me I may want to cache it.
> 
> The reason why I want to avoid stat is that after stat(), open() is
> much less of a problem (the directory block has been read).
> 
> Is what I said realistic?
> The main problem is to avoid the process_request() call after I send 
> the file from cache :(
> 
> thanks,
> igor
> 
> Dean Gaudet wrote:
> > 
> > I'd say you're on your own.  Nobody has done this that I know of, but it's
> > supposed to be possible.
> > 
> > I strongly urge you NOT to do this.  It will make your module useless for
> > any distributed site.  Not to mention that you'll have to implement all
> > the symlink checks, .htaccess files, and more stuff yourself.
> > 
> > Dean
> > 
> > On Thu, 18 Dec 1997, Igor Tatarinov wrote:
> > 
> > > Sounds good. What do I do?
> > >
> > > Do I override translate_handler or post_read_request? I don't see how
> > > I can avoid process_request_internal(). I don't want it to be called
> > > on a cache hit.
> > >
> > > igor
> > >
> > >
> 


Re: On avoiding stat()

Posted by Igor Tatarinov <ta...@prairie.NoDak.edu>.
No, I don't want to rewrite what has already been written. Here is what
I would like to do:

Main rule:
a file can only enter the cache once it's been thru all Apache normal 
processing stuff.

Hence:
1) sometime before process_request() I check if the requested file is 
in my cache (I may have to check a lot of CGI-request names etc.)

2) if yes, and the request is not IMS
I send the file from cache assuming that is no big deal if it's a bit 
stale. I need to make sure process_request() is not called.

   if no, I call process_request().
if the request comes back to me I may want to cache it.

The reason why I want to avoid stat is that after stat(), open() is
much less of a problem (the directory block has been read).

Is what I said realistic?
The main problem is to avoid the process_request() call after I send 
the file from cache :(

thanks,
igor

Dean Gaudet wrote:
> 
> I'd say you're on your own.  Nobody has done this that I know of, but it's
> supposed to be possible.
> 
> I strongly urge you NOT to do this.  It will make your module useless for
> any distributed site.  Not to mention that you'll have to implement all
> the symlink checks, .htaccess files, and more stuff yourself.
> 
> Dean
> 
> On Thu, 18 Dec 1997, Igor Tatarinov wrote:
> 
> > Sounds good. What do I do?
> >
> > Do I override translate_handler or post_read_request? I don't see how
> > I can avoid process_request_internal(). I don't want it to be called
> > on a cache hit.
> >
> > igor
> >
> >

Re: On avoiding stat()

Posted by Dean Gaudet <dg...@arctic.org>.
I'd say you're on your own.  Nobody has done this that I know of, but it's
supposed to be possible. 

I strongly urge you NOT to do this.  It will make your module useless for
any distributed site.  Not to mention that you'll have to implement all
the symlink checks, .htaccess files, and more stuff yourself.

Dean

On Thu, 18 Dec 1997, Igor Tatarinov wrote:

> Sounds good. What do I do? 
> 
> Do I override translate_handler or post_read_request? I don't see how
> I can avoid process_request_internal(). I don't want it to be called
> on a cache hit.
> 
> igor
> 
> 
> Dean Gaudet wrote:
> > 
> > Actually no... I'm not telling the whole truth... you could implement
> > everything in terms of uris, make sure that r->filename is some
> > non-existant file, r->finfo.st_mode = 0, and totally skip all of apache's
> > filesystem handling.  This would be similar to how you would map a
> > url-space onto a database in apache.
> > 
> > Dean
> >
> 


Re: On avoiding stat()

Posted by Igor Tatarinov <ta...@prairie.NoDak.edu>.
Sounds good. What do I do? 

Do I override translate_handler or post_read_request? I don't see how
I can avoid process_request_internal(). I don't want it to be called
on a cache hit.

igor


Dean Gaudet wrote:
> 
> Actually no... I'm not telling the whole truth... you could implement
> everything in terms of uris, make sure that r->filename is some
> non-existant file, r->finfo.st_mode = 0, and totally skip all of apache's
> filesystem handling.  This would be similar to how you would map a
> url-space onto a database in apache.
> 
> Dean
>

Re: On avoiding stat()

Posted by Dean Gaudet <dg...@arctic.org>.
Actually no... I'm not telling the whole truth... you could implement
everything in terms of uris, make sure that r->filename is some
non-existant file, r->finfo.st_mode = 0, and totally skip all of apache's
filesystem handling.  This would be similar to how you would map a
url-space onto a database in apache. 

Dean

On Thu, 18 Dec 1997, Dean Gaudet wrote:

> You're ignoring files which are automatically updated by external
> processes -- which is the norm for all the large servers I deal with.  The
> files are created on an internal host, then mirrored to N webservers via
> rdist or rsync.
> 
> At any rate, no, there's no way to get rid of the stat().  It's all tied
> up with how apache decides that your module will handle the request.  A
> possible future optimization would be a 5 or 10 second stat() cache in
> multithreaded servers.
> 
> Dean
> 
> On Thu, 18 Dec 1997, Igor Tatarinov wrote:
> 
> > If the author of the page requests it, it will be reloaded and I should 
> > be able to detect it. If somebody else requests a stale page, it can 
> > send a stale copy (this assumes I do stat the file every once in a while, 
> > say every 30 min).
> > 
> > In fact, this means that I will never have  a stale copy if the author 
> > takes a look at the final version of his/her masterpiece. And this is
> > typically the case, I'd guess.
> > 
> > am I missing something?
> > igor
> > 
> > Dean Gaudet wrote:
> > > 
> > > You need to stat() the file regardless.  How else do you determine if the
> > > file in your cache is stale?  Even on non-IMS requests you need to stat().
> > > Otherwise you have to implement full HTTP/1.1 proxy semantics, which is
> > > non-trivial and not worth saving the stat() call imho.
> > > 
> > > Dean
> > > 
> > > On Thu, 18 Dec 1997, Igor Tatarinov wrote:
> > > 
> > > > Hi all!
> > > >
> > > > As you might remember, I am trying to add a document cache to apache that
> > > > would (hopefully) decrease the number of I/Os and save some CPU cycles by
> > > > avoiding open/mmap/close
> > > >
> > > > So I got a question: is there any way to avoid the stat() call. I know that
> > > > Apache needs to make it to mmap the file but if I already got the file in
> > > > the cache and know its size, there is no reason to stat() the file. Of
> > > > course, the file may have been updated but this happens not often and I
> > > > could probably check for "If-Modified-Since" and stat the file only if
> > > > reload was requested.
> > > >
> > > > any ideas?
> > > >
> > > > thanks,
> > > > igor
> > > >
> > > > --
> > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > Igor Tatarinov, graduate student, Computer Science Dept, NDSU
> > > > e-mail: tatarino@prairie.nodak.edu   or   itat@acm.org
> > > > http://www.cs.ndsu.nodak.edu/~tatarino
> > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > >
> > 
> > -- 
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > Igor Tatarinov, graduate student, Computer Science Dept, NDSU
> > e-mail: tatarino@prairie.nodak.edu   or   itat@acm.org
> > http://www.cs.ndsu.nodak.edu/~tatarino
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > 
> 
> 


Re: On avoiding stat()

Posted by Dean Gaudet <dg...@arctic.org>.
You're ignoring files which are automatically updated by external
processes -- which is the norm for all the large servers I deal with.  The
files are created on an internal host, then mirrored to N webservers via
rdist or rsync.

At any rate, no, there's no way to get rid of the stat().  It's all tied
up with how apache decides that your module will handle the request.  A
possible future optimization would be a 5 or 10 second stat() cache in
multithreaded servers.

Dean

On Thu, 18 Dec 1997, Igor Tatarinov wrote:

> If the author of the page requests it, it will be reloaded and I should 
> be able to detect it. If somebody else requests a stale page, it can 
> send a stale copy (this assumes I do stat the file every once in a while, 
> say every 30 min).
> 
> In fact, this means that I will never have  a stale copy if the author 
> takes a look at the final version of his/her masterpiece. And this is
> typically the case, I'd guess.
> 
> am I missing something?
> igor
> 
> Dean Gaudet wrote:
> > 
> > You need to stat() the file regardless.  How else do you determine if the
> > file in your cache is stale?  Even on non-IMS requests you need to stat().
> > Otherwise you have to implement full HTTP/1.1 proxy semantics, which is
> > non-trivial and not worth saving the stat() call imho.
> > 
> > Dean
> > 
> > On Thu, 18 Dec 1997, Igor Tatarinov wrote:
> > 
> > > Hi all!
> > >
> > > As you might remember, I am trying to add a document cache to apache that
> > > would (hopefully) decrease the number of I/Os and save some CPU cycles by
> > > avoiding open/mmap/close
> > >
> > > So I got a question: is there any way to avoid the stat() call. I know that
> > > Apache needs to make it to mmap the file but if I already got the file in
> > > the cache and know its size, there is no reason to stat() the file. Of
> > > course, the file may have been updated but this happens not often and I
> > > could probably check for "If-Modified-Since" and stat the file only if
> > > reload was requested.
> > >
> > > any ideas?
> > >
> > > thanks,
> > > igor
> > >
> > > --
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > Igor Tatarinov, graduate student, Computer Science Dept, NDSU
> > > e-mail: tatarino@prairie.nodak.edu   or   itat@acm.org
> > > http://www.cs.ndsu.nodak.edu/~tatarino
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > >
> 
> -- 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Igor Tatarinov, graduate student, Computer Science Dept, NDSU
> e-mail: tatarino@prairie.nodak.edu   or   itat@acm.org
> http://www.cs.ndsu.nodak.edu/~tatarino
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 


Re: On avoiding stat()

Posted by Igor Tatarinov <ta...@prairie.NoDak.edu>.
If the author of the page requests it, it will be reloaded and I should 
be able to detect it. If somebody else requests a stale page, it can 
send a stale copy (this assumes I do stat the file every once in a while, 
say every 30 min).

In fact, this means that I will never have  a stale copy if the author 
takes a look at the final version of his/her masterpiece. And this is
typically the case, I'd guess.

am I missing something?
igor

Dean Gaudet wrote:
> 
> You need to stat() the file regardless.  How else do you determine if the
> file in your cache is stale?  Even on non-IMS requests you need to stat().
> Otherwise you have to implement full HTTP/1.1 proxy semantics, which is
> non-trivial and not worth saving the stat() call imho.
> 
> Dean
> 
> On Thu, 18 Dec 1997, Igor Tatarinov wrote:
> 
> > Hi all!
> >
> > As you might remember, I am trying to add a document cache to apache that
> > would (hopefully) decrease the number of I/Os and save some CPU cycles by
> > avoiding open/mmap/close
> >
> > So I got a question: is there any way to avoid the stat() call. I know that
> > Apache needs to make it to mmap the file but if I already got the file in
> > the cache and know its size, there is no reason to stat() the file. Of
> > course, the file may have been updated but this happens not often and I
> > could probably check for "If-Modified-Since" and stat the file only if
> > reload was requested.
> >
> > any ideas?
> >
> > thanks,
> > igor
> >
> > --
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > Igor Tatarinov, graduate student, Computer Science Dept, NDSU
> > e-mail: tatarino@prairie.nodak.edu   or   itat@acm.org
> > http://www.cs.ndsu.nodak.edu/~tatarino
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Igor Tatarinov, graduate student, Computer Science Dept, NDSU
e-mail: tatarino@prairie.nodak.edu   or   itat@acm.org
http://www.cs.ndsu.nodak.edu/~tatarino
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Re: On avoiding stat()

Posted by Dean Gaudet <dg...@arctic.org>.
You need to stat() the file regardless.  How else do you determine if the
file in your cache is stale?  Even on non-IMS requests you need to stat(). 
Otherwise you have to implement full HTTP/1.1 proxy semantics, which is
non-trivial and not worth saving the stat() call imho. 

Dean

On Thu, 18 Dec 1997, Igor Tatarinov wrote:

> Hi all!
> 
> As you might remember, I am trying to add a document cache to apache that
> would (hopefully) decrease the number of I/Os and save some CPU cycles by
> avoiding open/mmap/close
> 
> So I got a question: is there any way to avoid the stat() call. I know that
> Apache needs to make it to mmap the file but if I already got the file in
> the cache and know its size, there is no reason to stat() the file. Of
> course, the file may have been updated but this happens not often and I
> could probably check for "If-Modified-Since" and stat the file only if
> reload was requested.
> 
> any ideas?
> 
> thanks,
> igor
> 
> -- 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Igor Tatarinov, graduate student, Computer Science Dept, NDSU
> e-mail: tatarino@prairie.nodak.edu   or   itat@acm.org
> http://www.cs.ndsu.nodak.edu/~tatarino
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>