You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by David Dyer-Bennet <dd...@dd-b.net> on 2002/08/06 19:15:35 UTC

Caching control

I've read caching tutorials, and have a tolerable understanding for
what a validator is, and how setting a validator differs from
providing a max-age parameter.  However this is "book-learning", I
haven't worked with it enough to really understand if I understand, if
you see what I mean.

As is common on sites, I'm working with a fully-dynamic homepage, with
most of the content generated from database queries.  It also has a
box which shows whether the user is currently logged in to the site
(and provides links and form fields to manage that state).  I'm low on
cpu horsepower (this is a fan site for a fantasy author, hosted out of
my basement) and would like NOT to force the homepage to be
non-cached.  I'm happy to set an arbitrary timeframe for how often it
actually checks to see if the database has changed so as to cause
different content to appear on the page.  However, I haven't
implemented anything that deals with getting the login box to update
reliably when they log in as one user, goto a page, log in as another,
and return to the homepage via a link. 

The only thing I've been able to think of that might handle this is to
generate a strong validator myself, and to handle HEAD requests, and
set the caching mode so it checks for each access.  That's a *big*
step off into deep water for me; anybody know if this is the right
general approach for this problem?

And if it *is* a useful approach, can somebody point me at the
mod_perl constructs for handling head requests?  And any pointers,
tips, traps, pitfalls, etc.?

Another approach I've considered is frames.  But ick.  And there's
always trouble trying to control updates to other frames, and the
solution always seems to be javascript, and ick.  And it doesn't work
well for people using screen readers.  And ick.  And have I mentioned
that I just don't *like* frames very well?

Watching the behavior of my system, I have the feeling that many of
the big commercial sites simply force the pages not to cache, and
generate them on each request.  Is that laziness, or the only really
reliable way to handle the issue?

(The site is actually implemented in Mason, but I'm reasonably sure
the tools/concepts I need for this aren't Mason-specific, but are
rather details of the mod_perl interface to Apache).
-- 
David Dyer-Bennet, dd-b@dd-b.net  /  New TMDA anti-spam in test
 John Dyer-Bennet 1915-2002 Memorial Site http://john.dyer-bennet.net
        Book log: http://www.dd-b.net/dd-b/Ouroboros/booknotes/
         New Dragaera mailing lists, see http://dragaera.info

Re: Caching control

Posted by Perrin Harkins <pe...@elem.com>.
David Dyer-Bennet wrote:
> I'm low on
> cpu horsepower (this is a fan site for a fantasy author, hosted out of
> my basement) and would like NOT to force the homepage to be
> non-cached.  I'm happy to set an arbitrary timeframe for how often it
> actually checks to see if the database has changed so as to cause
> different content to appear on the page.  However, I haven't
> implemented anything that deals with getting the login box to update
> reliably when they log in as one user, goto a page, log in as another,
> and return to the homepage via a link.

One solution is to cache these contents separately.  You can cache the 
data or the generated HTML for each section separately, so that although 
the browser has to contact your server each time, the server does very 
little work to assemble the page.  Use a system like Cache::Cache for 
the storage of the cached stuff and invalidating the content when it 
changes becomes simple.

However, if you're server is really overwhelmed then Joachi,'s 
suggestion of using unique URLs will get you further.  At eToys we 
actually did both.  We cached the data used to make the homepage, and 
used the unique URL technique to cache the generated page in our 
fornt-end proxy servers.  When you hit the site, you would be 
immediately redirected to a unique URL generated from your user ID.  The 
ID was munged in the URL so that it just looked like some funky string 
tacked onto the end.  We did all of this in our front-end proxy servers, 
using mod_rewrite, and cached the generated page there for a few minutes.

- Perrin


Re: Caching control

Posted by Joachim Zobel <nc...@netcologne.de>.
At 19:40 06.08.2002 -0500, you wrote:
>Joachim Zobel <nc...@netcologne.de> writes:
> > This is far too complicated IMHO. The easiest way to get the caching
> > behaviour of static pages is to use an SSI "frame" with XBitHack full
> > around the pages. Every time you modify the database, you touch the
> > frame. This is a bit tricky. The "frame" is not writeable by the
> > webserver user (this is good!), so _this_ user can't touch it.
>
>I always prefer simpler solutions.  I think it would actually be
>simpler to skip the SSI layer and just generate my own last-modified
>or equivalent, though.

This is not that simple. You also have to take care of the 304 status.

Apache::HeavyCGI does this if I recall correctly.

hth,
Joachim

--
"... ein Geschlecht erfinderischer Zwerge, die fuer alles gemietet werden
koennen."                            - Bertolt Brecht - Leben des Galilei


Re: Caching control

Posted by David Dyer-Bennet <dd...@dd-b.net>.
Joachim Zobel <nc...@netcologne.de> writes:

> At 12:15 06.08.2002 -0500, you wrote:
> >As is common on sites, I'm working with a fully-dynamic homepage, with
> >most of the content generated from database queries.  It also has a
> >box which shows whether the user is currently logged in to the site
> >(and provides links and form fields to manage that state).  I'm low on
> >cpu horsepower (this is a fan site for a fantasy author, hosted out of
> >my basement) and would like NOT to force the homepage to be
> >non-cached.  I'm happy to set an arbitrary timeframe for how often it
> >actually checks to see if the database has changed so as to cause
> >different content to appear on the page.  However, I haven't
> >implemented anything that deals with getting the login box to update
> >reliably when they log in as one user, goto a page, log in as another,
> >and return to the homepage via a link.
> >
> >The only thing I've been able to think of that might handle this is to
> >generate a strong validator myself, and to handle HEAD requests, and
> >set the caching mode so it checks for each access.  That's a *big*
> >step off into deep water for me; anybody know if this is the right
> >general approach for this problem?
> >
> >And if it *is* a useful approach, can somebody point me at the
> >mod_perl constructs for handling head requests?  And any pointers,
> >tips, traps, pitfalls, etc.?
> 
> This is far too complicated IMHO. The easiest way to get the caching
> behaviour of static pages is to use an SSI "frame" with XBitHack full
> around the pages. Every time you modify the database, you touch the
> frame. This is a bit tricky. The "frame" is not writeable by the
> webserver user (this is good!), so _this_ user can't touch it.

I always prefer simpler solutions.  I think it would actually be
simpler to skip the SSI layer and just generate my own last-modified
or equivalent, though.  

> The thing you then need to do is to have the URL fully reflect the
> state of the page. If the user logs in, you either add an user=joe to
> the query string or use use URL rewriting to put /user-joe/ in the URL
> path. The second has the advantage that it is preserved if you use
> relative links.

Now, I really should have thought of this.  I was pretty careful about
planning the rest of the URL that way, but I somehow didn't think of
including the session identity; too tied up with thinking of that as
being in the session cookie I guess.  

If I do this I'll have to be careful that sending a URL doesn't give
away access to your private account on the site, but that's doable;
keep the session in the cookie but duplicate the user ID in the URL. 

> The main reason to support caching is however not to save computing
> power or bandwith. Browser caching means end user performance.

Right, my cpu worries were really about the performance seen by the
user; it's just that *I* know the problem is I produce the pages
rather slowly, so far, so *I* think of it as a cpu use problem. 
-- 
David Dyer-Bennet, dd-b@dd-b.net  /  New TMDA anti-spam in test
 John Dyer-Bennet 1915-2002 Memorial Site http://john.dyer-bennet.net
        Book log: http://www.dd-b.net/dd-b/Ouroboros/booknotes/
         New Dragaera mailing lists, see http://dragaera.info

Re: Caching control

Posted by Joachim Zobel <nc...@netcologne.de>.
At 12:15 06.08.2002 -0500, you wrote:
>As is common on sites, I'm working with a fully-dynamic homepage, with
>most of the content generated from database queries.  It also has a
>box which shows whether the user is currently logged in to the site
>(and provides links and form fields to manage that state).  I'm low on
>cpu horsepower (this is a fan site for a fantasy author, hosted out of
>my basement) and would like NOT to force the homepage to be
>non-cached.  I'm happy to set an arbitrary timeframe for how often it
>actually checks to see if the database has changed so as to cause
>different content to appear on the page.  However, I haven't
>implemented anything that deals with getting the login box to update
>reliably when they log in as one user, goto a page, log in as another,
>and return to the homepage via a link.
>
>The only thing I've been able to think of that might handle this is to
>generate a strong validator myself, and to handle HEAD requests, and
>set the caching mode so it checks for each access.  That's a *big*
>step off into deep water for me; anybody know if this is the right
>general approach for this problem?
>
>And if it *is* a useful approach, can somebody point me at the
>mod_perl constructs for handling head requests?  And any pointers,
>tips, traps, pitfalls, etc.?

This is far too complicated IMHO. The easiest way to get the caching 
behaviour of static pages is to use an SSI "frame" with XBitHack full 
around the pages. Every time you modify the database, you touch the frame. 
This is a bit tricky. The "frame" is not writeable by the webserver user 
(this is good!), so _this_ user can't touch it.

The thing you then need to do is to have the URL fully reflect the state of 
the page. If the user logs in, you either add an user=joe to the query 
string or use use URL rewriting to put /user-joe/ in the URL path. The 
second has the advantage that it is preserved if you use relative links.

If you have only small pieces of information that change, you can consider 
setting cookies and putting it into images. If you only want to display the 
users name (or the fact that the user is logged in) you write a handler, 
that creates and later chooses an image with the users name depending on 
that cookie. Finally we are on topic :-) You need mod_perl to have a 
handler return a selected image with "static" spped.

The main reason to support caching is however not to save computing power 
or bandwith. Browser caching means end user performance.

 >[no to frames]

Agreed.

Hth,
Joachim

--
"... ein Geschlecht erfinderischer Zwerge, die fuer alles gemietet werden
koennen."                            - Bertolt Brecht - Leben des Galilei