You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Nate Campi <na...@campin.net> on 2003/04/25 22:09:08 UTC

mod_perl caching proxy

I'm trying to cache a mod_perl/SSI site, but 1.3 mod_proxy needs headers
that aren't there. apache 2 mod_disk_cache is supposed to work but
simply doesn't, mod_mem_cache isn't supposed to but does, though not
well at all (caches after six or more hits to the origin, lame). I serve
over a million hits a day and I need real caching (my site is mentioned
on the perl.apache.org front door, hint hint ;).

I'm wondering about a proxy middle-tier in mod_perl that simply adds
content-length headers so that 1.3 mod_proxy can cache the site.
XbitHack won't work here since much of the SSI is generated by third
party software that can't set the executable bit on generated files.

I can't seem to find such a proxy, it seems it would be easy in
mod_perl. A straight perl daemon would be more lightweight, and possibly
more desirable but it seems that someone would have belted out a
mod_perl version by now. Are there any?
-- 
Nate Campi    http://www.campin.net 

Re: mod_perl caching proxy

Posted by Ged Haywood <ge...@www2.jubileegroup.co.uk>.
Hi there,

On Fri, 25 Apr 2003, Nate Campi wrote:

> That's not the problem, we *do* use expires. The problem is caching in
> the content in the first place.

No idea if it will do the job for you, but have you looked at mod_accel?

73,
Ged.


Re: mod_perl caching proxy

Posted by Nate Campi <na...@campin.net>.
On Fri, Apr 25, 2003 at 08:22:04PM -0700, Randal L. Schwartz wrote:
> >>>>> "Nate" == Nate Campi <na...@campin.net> writes:
> 
> Nate> Good question (how could I expect anything else from Merlyn?), but in my
> Nate> case an easy one. We're a news site so a policy like "cache everything
> Nate> for a default 10 minutes then dump it" will work just fine, maybe even
> Nate> just 5 minutes.
> 
> Then just add mod_expire.... you can force "Expires:
> $five_minutes_from_now" with mod_expire for everything served in an
> area.

That's not the problem, we *do* use expires. The problem is caching in
the content in the first place.
-- 
Nate Campi    http://www.campin.net 

Re: mod_perl caching proxy

Posted by "Randal L. Schwartz" <me...@stonehenge.com>.
>>>>> "Nate" == Nate Campi <na...@campin.net> writes:

Nate> Good question (how could I expect anything else from Merlyn?), but in my
Nate> case an easy one. We're a news site so a policy like "cache everything
Nate> for a default 10 minutes then dump it" will work just fine, maybe even
Nate> just 5 minutes.

Then just add mod_expire.... you can force "Expires:
$five_minutes_from_now" with mod_expire for everything served in an
area.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<me...@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

Re: mod_perl caching proxy

Posted by Nate Campi <na...@campin.net>.
On Fri, Apr 25, 2003 at 02:02:22PM -0700, Randal L. Schwartz wrote:
> >>>>> "Nate" == Nate Campi <na...@campin.net> writes:
> 
> Nate> I'm trying to cache a mod_perl/SSI site, but 1.3 mod_proxy needs headers
> Nate> that aren't there.
> 
> By default, mod_include is required not to add "Last-modified:" and
> "Expires:", because SSI makes the page dynamic.  If you want
> it cacheable, you can drive it from the age of the main page using
> the XBitHack, or from some mod_expires thing as Perrin suggested.

XBitHack would solve all my problems, but pages generated on the fly and
cached to the filesystem won't have the executable bit set at generation
time. I can't cruise the filesystem with find or cfengine or anything
like that since I have at least a million files in my docroot.

> But the real question is, what can your expiration policy be?  How
> long should the cache cache it?

Good question (how could I expect anything else from Merlyn?), but in my
case an easy one. We're a news site so a policy like "cache everything
for a default 10 minutes then dump it" will work just fine, maybe even
just 5 minutes. Being able to move the brunt of the traffic from the
generation apache boxes (expensive sun boxes with complicated software
configuration) to some vanilla apaches boxes (cheap x86 linux, simple
mod_proxy/mod_mem_cache config) would be a huge win for overall cost and
also for scalability.
-- 
Nate Campi    http://www.campin.net 

Re: mod_perl caching proxy

Posted by "Randal L. Schwartz" <me...@stonehenge.com>.
>>>>> "Nate" == Nate Campi <na...@campin.net> writes:

Nate> I'm trying to cache a mod_perl/SSI site, but 1.3 mod_proxy needs headers
Nate> that aren't there.

By default, mod_include is required not to add "Last-modified:" and
"Expires:", because SSI makes the page dynamic.  If you want
it cacheable, you can drive it from the age of the main page using
the XBitHack, or from some mod_expires thing as Perrin suggested.

But the real question is, what can your expiration policy be?  How
long should the cache cache it?

For example, right now, stonehenge.com is almost completely dynamic.
If you go to www.stonehenge.com/merlyn/, each time you hit reload,
almost all of that page is recomputed (except for some caching of the
breadcrumb bar at the top).  This is even though it goes through a
caching proxy.  That's because I still haven't worked out to what to
tie the cache last-modified or expires.  Should it be the latest of
all the components?  Should it be only the main page's timestamp?  Or
maybe I should simply expire it in 12 hours, and when I update the
site, flush my cache?  (By the way, the new wrapper-process thing in
Template Toolkit will help me here... because I wanted to trigger some
code to set the last-modified to the latest of all the components of a
page...  and now I can do that easily.  Thanks Andy.)

It's an interesting question.  And one you'll have to answer for
yourself before we can give any concrete help about your header
problem.  How will you compute "last-modified", "expires" and handling
"if-modified-since", which are all necessary components for
cacheability?

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<me...@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

Re: mod_perl caching proxy

Posted by Nate Campi <na...@campin.net>.
On Fri, Apr 25, 2003 at 04:18:39PM -0400, Perrin Harkins wrote:
> Nate Campi wrote:
> >I'm trying to cache a mod_perl/SSI site, but 1.3 mod_proxy needs headers
> >that aren't there.
> 
> Is there some reason you can't add them?  You don't need another proxy 
> server to do it.  Just put them in your code, or use mod_expires.

We use expires:

 HTTP/1.1 200 OK
 Date: Fri, 25 Apr 2003 23:09:47 GMT
 Server: Apache/1.3.26 (Unix)
 P3P: CP="IDC DSP COR CURa ADMa DEVa CUSa PSAa IVAa CONo OUR IND UNI STA"
 Cache-Control: max-age=3600
 Expires: Sat, 26 Apr 2003 00:09:47 GMT
 Content-Type: text/html

...that helps with Avantgo proxies that grab content from us, but as for
squid or mod_proxy we need at least content-length, and preferably
last-modified. SSI can't reliably do that, but a middle-tier sort of
proxy could at least do the content-length.

> By the way, if you want something more flexible for SSI, Apache::SSI 
> works great on mod_perl 1.x.

That would rule, I'm sure we could easily extend it to support the
in-house mod_include hacks I have to patch into all our apache builds:
<URL:http://hotwired.lycos.com/webmonkey/99/10/index0a_page5.html>. I'm
going to look into that.
-- 
Nate Campi    http://www.campin.net 

Re: mod_perl caching proxy

Posted by Perrin Harkins <pe...@elem.com>.
Nate Campi wrote:
> I'm trying to cache a mod_perl/SSI site, but 1.3 mod_proxy needs headers
> that aren't there.

Is there some reason you can't add them?  You don't need another proxy 
server to do it.  Just put them in your code, or use mod_expires.

By the way, if you want something more flexible for SSI, Apache::SSI 
works great on mod_perl 1.x.

- Perrin