You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Mark Stosberg <ma...@summersault.com> on 2007/01/11 21:44:26 UTC

Finding a balance between caching and stat tracking

Hello,

I'm researching how to best integrate caching with a mod_perl website.
For some important cases I want to do some customized stat tracking for
particular queries, although they otherwise return the same content,
which would otherwise be  directly cacheable.

The approach that seems to fit this well is the following, based on
research (but not yet practice!):

1. Include a "Last-Modified" header as often as possible. As I
understand, this doesn't mean "cache me and don't come back", but is an
invitation to simply re-validate later.

2. Look for "If-Modified-Since" headers in incoming responses, meaning
that the client has stored a copy.  When this kind of Conditional GET
comes in, I can still run the tracking code unconditionally. However, I
could return a 304 if the cache was up to date, or return the standard
content otherwise.

Is that a reasonable sounding approaching for the
want-caching-but-need-stats case? Or I should be working on adjusting my
thinking to deal with less accurate stats so improve caching performance.

Thanks!

  Mark


Re: Finding a balance between caching and stat tracking

Posted by Jonathan Vanasco <jv...@2xlp.com>.
my suggestion would be this:

   use a lightweight proxy on port 80.  i like nginx.
   have the proxy handle as much stat tracking as possible -- write  
to a logfile something like "%{cookieID}\t%{request_arg}\t% 
{trackedData}"
   then just batch that overnight.

if you do that for static pages, you can kill a GIANT amount of  
traffic directed at mod_perl, and still have your tracking

personally, i've pretty much given up on caching non-binary content.
nearly every page i serve now is dynamic : everything needs a login  
box , or a 'hello' box.  i heavily cache objects to  generate pages  
with, but i've seen a rapid decrease in cache-able items .


// Jonathan Vanasco

| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - -
| FindMeOn.com - The cure for Multiple Web Personality Disorder
| Web Identity Management and 3D Social Networking
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - -
| RoadSound.com - Tools For Bands, Stuff For Fans
| Collaborative Online Management And Syndication Tools
| - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
- - - - - - - - - - - - - - - -



Re: Finding a balance between caching and stat tracking

Posted by Perrin Harkins <pe...@elem.com>.
On Thu, 2007-01-11 at 15:44 -0500, Mark Stosberg wrote:
> I'm researching how to best integrate caching with a mod_perl website.
> For some important cases I want to do some customized stat tracking for
> particular queries, although they otherwise return the same content,
> which would otherwise be  directly cacheable.

The common approach these days is a web bug, using an IMG tag or
JavaScript, e.g. Google Analytics.  You let the page get cached, but
force that part to hit your server each time.

I don't see any reason why your idea with If-Modified-Since wouldn't
work, except that you really have no control over what clients will do
and some of them (e.g. proxy servers) may just cache your stuff and not
ask for a while if they get a Last-Modified header.

I also recommend you take a look at Michael Radwin's slides on how Yahoo
deals with some of this:
http://www.radwin.org/michael/blog/2004/07/http_caching_and_cachebust.html

- Perrin