You are viewing a plain text version of this content. The canonical link for it is here.

Posted to asp@perl.apache.org by Joshua Chamas <jo...@chamas.com> on 2001/09/12 23:25:15 UTC

ASP Includes Output Caching RFC

Hey,

I'm working on an includes output caching mechanism, based 
on the same dbm cache layer just released for the XSLTCache.
The functionality would allow for the output from an include
to be cached based on some optional key for a specified amount
of time.  So if an include might take 100ms to execute, and 
the cache only takes 5ms to fetch from, there could be
a 20x savings for that part of the script.

This caching layer would eventually extend to caching
whole scripts, and also to have a user level cache extension.

The includes caching API might look like:

  $Response->Include({ 
	  File  => 'file.inc',
	  Cache => 3600, # to cache one hour
	  Key   => [\%data || \@data || \$data || $data || undef]
	},
	@args
	);

$Response->Include('file.inc', @args) would be also OK for an API
as it is now, the above is just an extension. 

The Cache argument would be the time in seconds to cache for.

The Key part would serialized & combined with the file name
to create a lookup for the cache... so if a header was dependent
on the user name in $Session for example, the key might be called 
like:

  Key => $Session->{user_id},

of if the include should change based on the query string data,
you might:

  Key => $Request->QueryString,

The functionality I have now would auto expire should the 
web server be restarted.  This can be useful as a web server 
restart is often associated with a full code publish, which
would usually be done at night off peak.  It seems like this 
could also be a httpd.conf config like 

  PerlSetVar CacheRestartPurge 1

But if it doesn't have to be created, then I won't create it.
Does it seem sensible to auto purge your cache every server
restart? 

-- Josh
_________________________________________________________________
Joshua Chamas                           Chamas Enterprises Inc.
NodeWorks Founder                       Huntington Beach, CA  USA 
http://www.nodeworks.com                1-714-625-4051

---------------------------------------------------------------------
To unsubscribe, e-mail: asp-unsubscribe@perl.apache.org
For additional commands, e-mail: asp-help@perl.apache.org

Re: ASP Includes Output Caching RFC

Posted by Joshua Chamas <jo...@chamas.com>.

"Joel W. Reed" wrote:
> 
> On Sep 12, joshua@chamas.com contorted a few electrons to say...
> Joshua> Hey,
> Joshua>
> Joshua> I'm working on an includes output caching mechanism, based
> Joshua> on the same dbm cache layer just released for the XSLTCache.
> Joshua> The functionality would allow for the output from an include
> Joshua> to be cached based on some optional key for a specified amount
> Joshua> of time.  So if an include might take 100ms to execute, and
> Joshua> the cache only takes 5ms to fetch from, there could be
> Joshua> a 20x savings for that part of the script.
> Joshua>
> 
> very cool. i thought it might be helpful to see what M$
> does around server-side caching.
> 
> 1). Options in ASP.NET (good data points here -imho)
> 
>   http://msdn.microsoft.com/library/en-us/cpguidnf/html/cpconaspcachingfeatures.asp
> 

Thanks for the link. I think what I will end up implementing
will support & extend the $Response->Cache() syntax for caching
the entire page.  Though certainly supporting the declarative
approach would be a good thing too!

> (it seems they prefer a declarative approach to
>  controlling the caching of pages)
> 
> 2). The info on IIS4.0/ASP seems more sketchy. From
> http://support.microsoft.com/support/kb/articles/Q189/4/09.ASP
> 

This is interesting.  The Expires header should refer to the
browser cache, but it seems that they might be using it
to do a web server level cache too.  I have thought about 
using Expires in this way to enable web server caching,
but I feel that confusing browser level cache with
web level cache might be confusing.  I'll consider this
when implementing the $Response->Cache() feature, how
expires might play into this.

Actually a server level cache might make MORE sense than browser
level for Expires, because web browser clocks may be set to 
whatever, and expires dates are absolute & not relative.

> 3). while on the subject of performance in ASP apps here's
> something interesting:
> 
> (emphasis mine)
> 
> If the user gets impatient, he or she may abandon your ASP page before
> you even start executing their request. If he clicks Refresh or moves
> to a different page on your server, you will have a new request

Unfortunately, it seems that $Response->{IsClientConnected} can
only be updated after a $Response->Flush() because Apache doesn't
update its $r->connection->aborted record until a $r->print() 
is done.  If anyone knows anything different let me know, as then
I could refresh the IsClientConnected value before Script_OnStart runs
which could be useful for globally ending scripts.

Alternatives to this common problem include serializing session
access with SessionSerialize or $Session->Lock() and do a flush 
& test.  What this will do is make sure requests from the same
browser queue up, and you can check each request one at a time 
like so:

sub Script_OnStart {
   $Session->Lock();
   $Response->Flush();
   $Response->{IsClientConnected} || $Response->End;
}

Unfortunately, once you flush, your headers have been sent out,
so you can no longer do things like $Response->Cookies() 
or $Response->Expires

I'll check on the mod_perl list to see if anyone knows
how to detect a client aborted without an $r->print() being
called first.

--Josh
_________________________________________________________________
Joshua Chamas                           Chamas Enterprises Inc.
NodeWorks Founder                       Huntington Beach, CA  USA 
http://www.nodeworks.com                1-714-625-4051

---------------------------------------------------------------------
To unsubscribe, e-mail: asp-unsubscribe@perl.apache.org
For additional commands, e-mail: asp-help@perl.apache.org

Re: ASP Includes Output Caching RFC

Posted by "Joel W. Reed" <jr...@support.ddiworld.com>.

On Sep 12, joshua@chamas.com contorted a few electrons to say...
Joshua> Hey,
Joshua> 
Joshua> I'm working on an includes output caching mechanism, based 
Joshua> on the same dbm cache layer just released for the XSLTCache.
Joshua> The functionality would allow for the output from an include
Joshua> to be cached based on some optional key for a specified amount
Joshua> of time.  So if an include might take 100ms to execute, and 
Joshua> the cache only takes 5ms to fetch from, there could be
Joshua> a 20x savings for that part of the script.
Joshua> 

very cool. i thought it might be helpful to see what M$
does around server-side caching.

1). Options in ASP.NET (good data points here -imho)

  http://msdn.microsoft.com/library/en-us/cpguidnf/html/cpconaspcachingfeatures.asp

(it seems they prefer a declarative approach to 
 controlling the caching of pages)

2). The info on IIS4.0/ASP seems more sketchy. From
http://support.microsoft.com/support/kb/articles/Q189/4/09.ASP

"ISAPI applications (Active Server Pages Web pages) can be cached on
Internet Information Server. When you create a new IIS 4 application,
caching of ISAPI Applications is on by default. "

and

At the top of the .asp page that you do not want cached, add the
following line:

   <% Response.Expires=0 %> 


3). while on the subject of performance in ASP apps here's
something interesting:

(emphasis mine)

If the user gets impatient, he or she may abandon your ASP page before
you even start executing their request. If he clicks Refresh or moves
to a different page on your server, you will have a new request
sitting at the end of the ASP request queue and a disconnected request
sitting in the middle of the queue. Often this happens when your
server is under high load (so it has a long request queue, with
correspondingly high response times) and this only makes the situation
worse. There's no point executing an ASP page (especially a slow,
heavyweight ASP page) if the user is no longer connected. You can
check for this condition by using the Response.IsClientConnected
property. If it returns False, you should call Response.End and
abandon the rest of the page. IN FACT, IIS 5.0 CODIFIES THIS
PRACTICE WHENEVER ASP IS ABOUT TO EXECUTE A NEW REQUEST, IT CHECKS TO
SEE HOW LONG THE REQUEST HAS BEEN IN THE QUEUE. IF IT HAS BEEN THERE
FOR MORE THAN 3 SECONDS, ASP WILL CHECK TO SEE IF THE CLIENT IS STILL
CONNECTED AND IMMEDIATELY TERMINATE THE REQUEST IF IT'S NOT. You can
use the AspQueueConnectionTestTime setting in the metabase to adjust
this timeout of 3 seconds.

If you have a page that takes a very long time to execute, you may
also want to check Response.IsClientConnected at intervals. When
response buffering is enabled, it is a good idea to do Response.Flush
at intervals to give the user the impression that something is
happening.

from:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnasp/html/asptips.asp


-- 
------------------------------------------------------------------------
Joel W. Reed                                                412-257-3881
--------------All the simple programs have been written.----------------

Re: ASP Includes Output Caching RFC

Posted by Joshua Chamas <jo...@chamas.com>.

Philip Mak wrote:
> 
> The reason that I suggested a timestamp parameter (which means "regenerate
> the page if the cached copy is older than $timestamp") is that the
> Response->Include function would already have all the information it needs
> for this logic. It has the script filename, it has the key (e.g. query
> string), and it knows the timestamp of the cached file.
> 

Let me mull this over some more.

> > The directive CacheSize, in bytes, already supports this.  The same
> > cache size will be used for the XSLT cache, Includes/Script(TODO) cache,
> > and users cache(TODO).  The default is 10000000, so using all three
> > caches could take nearly 30M.  I don't want each to be specified
> > separately, there are already too many configs as it is.
> 
> Wouldn't it be more intuitive if you made it so that all three caches
> added together can only take CacheSize bytes, rather than letting each
> individual cache take CacheSize bytes?
> 
> The rationale for this is that when deciding what to set CacheSize to,
> someone would base this on how much disk space they have available. It's
> easier if they don't have to multiply by 3.

I can do this, but there will be more cache purging overhead
and I am not sure this is a good thing.  Now, each DBM cache is checked 
for its size only when used ... in this scenario, each DBM would be 
checked for its size every request, increasing the stats x3 possibly, 
depending on the number of caches used each request.  The last policy 
would be to only delete the cache if it fills up whenever 
a STORE is done.

So, depending on the number of caches found, with your suggestion,
I would just take CacheSize / # of caches found per request
to be each cache's limit.  This does make sense, but I don't like
the performance implications.

-- Josh
_________________________________________________________________
Joshua Chamas                           Chamas Enterprises Inc.
NodeWorks Founder                       Huntington Beach, CA  USA 
http://www.nodeworks.com                1-714-625-4051

---------------------------------------------------------------------
To unsubscribe, e-mail: asp-unsubscribe@perl.apache.org
For additional commands, e-mail: asp-help@perl.apache.org

Re: ASP Includes Output Caching RFC

Posted by Philip Mak <pm...@aaanime.net>.

On Thu, 13 Sep 2001, Joshua Chamas wrote:

> I think I've got it... your timestamp parameter will be the LastModified key.
>
>   $Response->Include( { LastModified => $timestamp }
>
> This mimics the HTTP RFC for Last-Modified header, seen at:
>  http://www.freesoft.org/CIE/RFC/1945/59.htm

(reads the RFC... "For database gateways, it may be the last-update
timestamp of the record.") Yes, that's exactly what I had in mind.

This will be a great performance win for dynamically generated websites
that are actually static (unless the site admin updates entries in the
database). :) There's one site where I had to write a command line perl
script that compares database timestamps with .html files to decide
whether to regenerate them or not... that was kind of ugly.

---------------------------------------------------------------------
To unsubscribe, e-mail: asp-unsubscribe@perl.apache.org
For additional commands, e-mail: asp-help@perl.apache.org

Re: ASP Includes Output Caching RFC

Posted by Joshua Chamas <jo...@chamas.com>.

Philip Mak wrote:
> 
> On Wed, 12 Sep 2001, Joshua Chamas wrote:
> 
> > I think that the invalidate parameter is the Clear I suggest,
> > and that Clear could be used to effectively deal with the timestamp
> > issue, but the logic must exist outside the caching mechanism.
> 
> Wouldn't it be a bit complicated to setup this logic, though? I would have
> to create some persistent variable keyed by the filename and Query String
> that keeps track of when the file was last recreated.
> 
> The reason that I suggested a timestamp parameter (which means "regenerate
> the page if the cached copy is older than $timestamp") is that the
> Response->Include function would already have all the information it needs
> for this logic. It has the script filename, it has the key (e.g. query
> string), and it knows the timestamp of the cached file.
> 

I think I've got it... your timestamp parameter will be the LastModified key.

  $Response->Include( { LastModified => $timestamp }

This mimics the HTTP RFC for Last-Modified header, seen at:
 http://www.freesoft.org/CIE/RFC/1945/59.htm

but instead of the server telling the browser when it was last
modified, the developer/script is telling the include/cache when 
it was last modified.  The use is not quite the same, but I think it
captures the meaning well.  If the cached item is older than the
LastModified timestamp, the cache entry will be expired.

How is that?

--Josh

---------------------------------------------------------------------
To unsubscribe, e-mail: asp-unsubscribe@perl.apache.org
For additional commands, e-mail: asp-help@perl.apache.org

Re: ASP Includes Output Caching RFC

Posted by Philip Mak <pm...@aaanime.net>.

On Wed, 12 Sep 2001, Joshua Chamas wrote:

> I think that the invalidate parameter is the Clear I suggest,
> and that Clear could be used to effectively deal with the timestamp
> issue, but the logic must exist outside the caching mechanism.

Wouldn't it be a bit complicated to setup this logic, though? I would have
to create some persistent variable keyed by the filename and Query String
that keeps track of when the file was last recreated.

The reason that I suggested a timestamp parameter (which means "regenerate
the page if the cached copy is older than $timestamp") is that the
Response->Include function would already have all the information it needs
for this logic. It has the script filename, it has the key (e.g. query
string), and it knows the timestamp of the cached file.

I think that it would be simple to implement too, if you have the Clear
parameter already. It would just be:

if ($args{timestamp} && $timestamp_of_cached_file < $args{timestamp}) {
  $args{clear} = 1;
}

Rationale:

The only reason I can think of invalidating the cache is that something
has changed. A timestamp should always be usable for determining whether
something changed before or after the cache file was calculated. If there
are multiple timestamps (e.g. the include file depends on the timestamp of
"a.asp" and "b.asp"), then they can be max()'d before being passed to
$Response->Include. That's why I think it's a good idea to have
"timestamp" be a parameter to $Response->Include; timestamp could be the
number of seconds since 1970 (i.e. a standard UNIX timestamp).

> The directive CacheSize, in bytes, already supports this.  The same
> cache size will be used for the XSLT cache, Includes/Script(TODO) cache,
> and users cache(TODO).  The default is 10000000, so using all three
> caches could take nearly 30M.  I don't want each to be specified
> separately, there are already too many configs as it is.

Wouldn't it be more intuitive if you made it so that all three caches
added together can only take CacheSize bytes, rather than letting each
individual cache take CacheSize bytes?

The rationale for this is that when deciding what to set CacheSize to,
someone would base this on how much disk space they have available. It's
easier if they don't have to multiply by 3.

---------------------------------------------------------------------
To unsubscribe, e-mail: asp-unsubscribe@perl.apache.org
For additional commands, e-mail: asp-help@perl.apache.org

Re: ASP Includes Output Caching RFC

Posted by Joshua Chamas <jo...@chamas.com>.

Philip Mak wrote:
> 

Thanks for the feedback.  Its much appreciated.

> On Wed, 12 Sep 2001, Joshua Chamas wrote:
> 
> >   $Response->Include({
> >         File  => 'file.inc',
> >         Cache => 3600, # to cache one hour
> >         Key   => [\%data || \@data || \$data || $data || undef]
> >       },
> >       @args
> >       );
> 
> Why is there a time limit on how long something can be cached? Wouldn't
> someone usually want the page to be cached indefinitely (space
> permitting)? I guess if someone doesn't specify a time limit, it should be
> infinite.
> 

OK, a Cache => 1 key to start caching, with an Expires => 3600
to specify for only one hour.  Without Expires set, the caching
would be indefinite, or until the web server restarts.  See below 
for full API example.

> I'm guessing there should be a directive that allows someone to set the
> maximum number of bytes that the cache is allowed to take (e.g. PerlSetVar
> MaxCacheSize), and some LRU/LFU policy could be employed when the cache
> fills up.
> 

The directive CacheSize, in bytes, already supports this.  The same
cache size will be used for the XSLT cache, Includes/Script(TODO) cache, and
users cache(TODO).  The default is 10000000, so using all three caches
could take nearly 30M.  I don't want each to be specified separately, 
there are already too many configs as it is.

> Let's see...we have three kinds of server restarts:
> 
> 1. apachectl graceful
> 2. apachectl restart
> 3. apachectl stop; apachectl start
> 

The kind of cache expiration I would implement would probably
occur under all 2 + 3 scenarios, because the cache expiration uses
the process pid $$ of the parent httpd as part of the key.  I believe
graceful keeps the parent httpd the same.

> I'm guessing you were asking about case #2. IIRC, when someone does
> "apachectl restart", all the modules are recompiled, so I think you'd HAVE
> to clear the cache too, otherwise weird things can happen.
> 

It all depends on the parent pid under this condition.  Me, I always
stop/start now.  I got tired to dealing with all the oddities that
come up under graceful/restart.

> As for case #1, it might be useful to be able to clear the cache without
> shutting down the server completely, although I'm not saying that *I* need
> that functionality.
> 

Hmmm, let's hope no one wants this. :)

> Another thought: What if I want the entire contents of a script to be
> cached? e.g. I want "page.asp" to be cached based on its query string.
> Would I have to put this:
> 
> <% $Response.Include({
>   File => 'real_page.inc',
>   Key => $Request->QueryString
> }); %>
> 

For now, yes, but I would like an API that deals with this too, just
not yet.  I imagine it would look like 

  $Response->Cache({ Key => $data, Expires => 3600 });

That said, perhaps the include cache API would look better as

  $Response->Include({ 
	Cache => 1,
	Expires => 3600, # or none for forever	
	Key => $data,
   }, @args);

That would certainly look more consistent.  There will be a user
cache one day too, and I think it will look like:

  $Server->Cache($key, [$value, $expires]);
    or
  $Server->Cache( { Key => $key, [ Value => $value, Expires => 7200 ] });

> Another thought: How would we do conditional cache refreshing? e.g. let's
> say I have a script "view.asp?id=5". The results of this script should be
> cached, BUT if the entry in the database for 5 has changed, it should not
> be pulled from the cache. How about something like this, then:
> 
> <%
> my $id = $Request->QueryString('id');
> my $time = $dbh->selectrow_array("SELECT modified FROM
>   data WHERE id=".$dbh->quote($id);
> $Response->Include({
>   file => 'view.inc',
>   key => $id,
>   timestamp => $time
> });
> 

Maybe a Clear argument like:

  $Response->Include({ 
	Cache => 1,
	Expires => 3600, # or none for forever	
	Key => $data,
	Clear => 1
   }, @args);

This would force the data to be re-cached.  Clear
might also take a sub ref that if executed and returned
true would also force the include to be recached.

  $Response->Include({ 
	Cache => 1,
	Expires => 3600, # or none for forever	
	Key => $data,
	Clear => sub { &check_db($Session) }
   }, @args);

> to invalidate that cache entry. "modified" would be a TIMESTAMP column in
> MySQL, which automatically gets touched if that row is updated. There
> could also be an "invalidate" parameter, which if true, will tell
> $Response->Include not to pull the page from the cache under any
> circumstances. It's probably useless if we have "timestamp" available,

I think that the invalidate parameter is the Clear I suggest,
and that Clear could be used to effectively deal with the timestamp
issue, but the logic must exist outside the caching mechanism.

> 1. Add a MaxCacheSize config parameter to allocate disk space for the cache
> 2. Clear cache on "restart"; maybe make config parameter to allow clearing
>    cache on "graceful"
> 3. Is it a bit cumbersome to need to create an entire file just as a cache
>    directive? Can we do better?
> 4. Perhaps add a "timestamp" and/or "invalidate" parameter to
>    $Response->Include to allow more powerful cache control
> 
> Ack, did I create more questions than I answered? :)
> 

I think I hit all these.  Let me know whether you like the 
Expires & Clear keys as additional arguments to the 
$Response->Include(\%attr) API.  You'll have to wait for 
the equivalent $Response->Cache() & $Server->Cache() though!

--Josh
_________________________________________________________________
Joshua Chamas                           Chamas Enterprises Inc.
NodeWorks Founder                       Huntington Beach, CA  USA 
http://www.nodeworks.com                1-714-625-4051

---------------------------------------------------------------------
To unsubscribe, e-mail: asp-unsubscribe@perl.apache.org
For additional commands, e-mail: asp-help@perl.apache.org

Re: ASP Includes Output Caching RFC

Posted by Philip Mak <pm...@aaanime.net>.

On Wed, 12 Sep 2001, Joshua Chamas wrote:

>   $Response->Include({
> 	  File  => 'file.inc',
> 	  Cache => 3600, # to cache one hour
> 	  Key   => [\%data || \@data || \$data || $data || undef]
> 	},
> 	@args
> 	);

Why is there a time limit on how long something can be cached? Wouldn't
someone usually want the page to be cached indefinitely (space
permitting)? I guess if someone doesn't specify a time limit, it should be
infinite.

I'm guessing there should be a directive that allows someone to set the
maximum number of bytes that the cache is allowed to take (e.g. PerlSetVar
MaxCacheSize), and some LRU/LFU policy could be employed when the cache
fills up.

> But if it doesn't have to be created, then I won't create it.
> Does it seem sensible to auto purge your cache every server
> restart?

Let's see...we have three kinds of server restarts:

1. apachectl graceful
2. apachectl restart
3. apachectl stop; apachectl start

It would make sense to clear the cache when #3 happens, since that's a
cold restart and everything else is cleared.

I'm guessing you were asking about case #2. IIRC, when someone does
"apachectl restart", all the modules are recompiled, so I think you'd HAVE
to clear the cache too, otherwise weird things can happen.

As for case #1, it might be useful to be able to clear the cache without
shutting down the server completely, although I'm not saying that *I* need
that functionality.

Another thought: What if I want the entire contents of a script to be
cached? e.g. I want "page.asp" to be cached based on its query string.
Would I have to put this:

<% $Response.Include({
  File => 'real_page.inc',
  Key => $Request->QueryString
}); %>

and then put the entire real page in real_page.inc? That seems a bit
cumbersome to have to create a whole file just for one cache directive,
but I don't have an alternative suggestion at the moment.

Another thought: How would we do conditional cache refreshing? e.g. let's
say I have a script "view.asp?id=5". The results of this script should be
cached, BUT if the entry in the database for 5 has changed, it should not
be pulled from the cache. How about something like this, then:

<%
my $id = $Request->QueryString('id');
my $time = $dbh->selectrow_array("SELECT modified FROM
  data WHERE id=".$dbh->quote($id);
$Response->Include({
  file => 'view.inc',
  key => $id,
  timestamp => $time
});

So, if the cache has an entry for "view.inc?id=5", it will compare the
time of the cache file with $args{timestamp} to determine whether it needs
to invalidate that cache entry. "modified" would be a TIMESTAMP column in
MySQL, which automatically gets touched if that row is updated. There
could also be an "invalidate" parameter, which if true, will tell
$Response->Include not to pull the page from the cache under any
circumstances. It's probably useless if we have "timestamp" available,
though.

In summary, my suggestions are:

1. Add a MaxCacheSize config parameter to allocate disk space for the cache
2. Clear cache on "restart"; maybe make config parameter to allow clearing
   cache on "graceful"
3. Is it a bit cumbersome to need to create an entire file just as a cache
   directive? Can we do better?
4. Perhaps add a "timestamp" and/or "invalidate" parameter to
   $Response->Include to allow more powerful cache control

Ack, did I create more questions than I answered? :)

---------------------------------------------------------------------
To unsubscribe, e-mail: asp-unsubscribe@perl.apache.org
For additional commands, e-mail: asp-help@perl.apache.org