You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by matt whiteley <ma...@gmail.com> on 2005/07/27 19:22:05 UTC

mod_autoindex caching

I admin a webserver that provides mirrors for a number of open source
projects and I frequently see high loads on the server as all files in
a directory are stated on each listing. I would like to have a caching
system for this so that if the mtime of the directory is not newer
than the cached index, the cached index is displayed. This saves both
cpu time and speeds up the response to the user request.

Is this something that would be received well as a patch towards
mod_autoindex? What method would be preferred for the caching? I
thought possibly a berkeley db file would be appropriate since the
server already requires it and I didn't want to add a dependency just
for this functionality.

If there is interest in this, I would like to work on it. I wanted to
ask first to insure that I headed off in a direction that could fit in
well with other developments. I have been searching around for this
functionality and not seen it so, I apologize if it is available and I
just missed it.

thanks,
-- 
matt whiteley <ma...@gmail.com>

Re: mod_autoindex caching

Posted by Graham Leggett <mi...@sharp.fm>.
r.pluem@t-online.de wrote:

> I do not think that this will work as expected. Although mod_autoindex can be instructed
> to set a Last-Modified header via IndexOption TrackModified. It does not handle conditional
> GET requests (in contrast to HEAD request) in a resource saving manner, because it does
> regenerate the listing for any GET request. But from my point of view mod_cache tries to
> revalidate the listing for each new client with a conditional GET request, thus causing
> mod_autoindex to regenerate the listing.

Hmmm... good point. Looking at mod_autoindex in more detail, it seems it 
doesn't add an ETag to the response that I can see, which means it can 
only cache based on Last-Modified. The autoindex page is made up of the 
directory listing as well as the results of some subrequests, making it 
difficult to calculate an ETag until the page is finished being 
rendered, at which time it's too late.

mod_expires can set Cache-Control: max-age for you, which will tell 
downstream caches (be they mod_cache, squid, or a browser cache) to 
consider any index younger than a predefined age as fresh, meaning no 
revalidation and no page regeneration.

The catch is that I am not 100% sure if caching is effective if the 
Cache-Control: max-age is present, but the ETag header is not.

What will be useful is an ETag filter, where requests smaller than a 
certain size, and who do not already have an ETag, can get an ETag added 
by hashing the content and headers, making the content cacheable.

Does anybody know if something like this exists already? Google 
searching the httpd.apache.org site shows that the only references to 
ETags are those based on files.

Regards,
Graham
--

Re: mod_autoindex caching

Posted by r....@t-online.de.

Graham Leggett wrote:
> matt whiteley wrote:
> 
>> I admin a webserver that provides mirrors for a number of open source
>> projects and I frequently see high loads on the server as all files in
>> a directory are stated on each listing. I would like to have a caching
>> system for this so that if the mtime of the directory is not newer
>> than the cached index, the cached index is displayed. This saves both
>> cpu time and speeds up the response to the user request.
> 
> 
> Assuming httpd v2.0+, mod_cache should be able to help with this, either
> the mem cache or the disk cache.
> 
> mod_cache can cache any webserver content no matter how generated
> (proxy, CGI, autoindex, whatever).
> 

I do not think that this will work as expected. Although mod_autoindex can be instructed
to set a Last-Modified header via IndexOption TrackModified. It does not handle conditional
GET requests (in contrast to HEAD request) in a resource saving manner, because it does
regenerate the listing for any GET request. But from my point of view mod_cache tries to
revalidate the listing for each new client with a conditional GET request, thus causing
mod_autoindex to regenerate the listing.

So I think mod_cache is the correct tool, but mod_autoindex needs to be fixed to deliver
what has been expected. The following (untested) patch might do this trick:


--- mod_autoindex.c.orig        2005-02-04 21:21:18.000000000 +0100
+++ mod_autoindex.c     2005-07-28 23:11:46.000000000 +0200
@@ -1948,6 +1948,11 @@
         return HTTP_FORBIDDEN;
     }

+    if ((errstatus = ap_meets_conditions(r)) != OK) {
+        apr_dir_close(thedir);
+        return errstatus;
+    }
+
 #if APR_HAS_UNICODE_FS
     ap_set_content_type(r, "text/html;charset=utf-8");
 #else

Sorry, I have currently no time to test it, so feedback is greatly appreciated.

Regards

RĂ¼diger

Re: mod_autoindex caching

Posted by Graham Leggett <mi...@sharp.fm>.
matt whiteley wrote:

> I admin a webserver that provides mirrors for a number of open source
> projects and I frequently see high loads on the server as all files in
> a directory are stated on each listing. I would like to have a caching
> system for this so that if the mtime of the directory is not newer
> than the cached index, the cached index is displayed. This saves both
> cpu time and speeds up the response to the user request.

Assuming httpd v2.0+, mod_cache should be able to help with this, either 
the mem cache or the disk cache.

mod_cache can cache any webserver content no matter how generated 
(proxy, CGI, autoindex, whatever).

Regards,
Graham
--