You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cocoon.apache.org by Miles Elam <mi...@pcextremist.com> on 2002/08/25 21:22:56 UTC

A lurker's RT -- proactive cache

First, some background: The server's a RedHat box running Cocoon w/ 
Tomcat and TUX in front.  TUX is a kernel-level, simple web server 
(basically static content) that shoots pages out of the system page 
cache at alarming speeds, with appropriate hardware copies data directly 
to network interface buffers rather than to main memory first, and if it 
can't serve the request, passes control of the socket to userland -- in 
my case, Tomcat/Cocoon on port 8080.  This also has the advantage of 
allowing Tomcat and the JVM to run as "nobody" since TUX can listen on 
80 and pass on to non-privaledged ports above 1024.  But I digress...

The initial purpose of this setup was to allow Tomcat/Cocoon the 
handling of dynamic data and TUX to handle static content -- their 
respective strong suits.  To put it in perspective, TUX can not only 
saturate a 100baseT link on meager Pentiums but has extraordinally low 
CPU and memory usage.  The less CPU/mem being used for static content, 
the more Tomcat/Cocoon has for the dynamic stuff.  This is the crux of 
my current mindset.  As I was working, I found myself wishing that they 
could have their caches combined -- or rather that TUX could have access 
to Tomcat/Cocoon's dynamic data cache.  This was dismissed as a pipe 
dream very quickly; native (kernel!) code tinkering around in a userland 
JVM instance shouldn't be considered a well-considered design.

But then I thought, "Why not have Cocoon's cache output files to a 
directory path that TUX can later serve?"  However, this ran into the 
problem of cache expiration.  If Cocoon's cache (from now on I'll refer 
to as a "reactive cache") were to output as a standard, serialized file 
for TUX to serve, TUX would indeed serve it, but it would no longer 
allow any requests to come into Tomcat/Cocoon.  Thus the reactive 
cache's entried would never be accessed again and would never expire its 
cache.  Thus stale data would forever more be served.

So would there be a way to allow Cocoon to expire cache entries 
immediately when prerequisite files were altered, database tables 
updated, etc.?  Not from within Java itself, no.  Not without excessive 
polling of the filesystem and other data stores.  This would so 
adversely affect the performance of the system so as to make any gains 
moot.  So what about opening up that Pandora's box which contains such 
items as "JNI code in Cocoon."

...I'll wait for the geering, insults, and epithets to die down...

Taken with the SGI library fam (File Alteration Monitor) for example, it 
would be possible to expire a cache entry as soon as a prerequisite was 
changed.  In the case of fam, the serialized file in the document tree 
would be deleted causing TUX to route the subsequent requests to 
userland to be generated again.  Obviously this would be horrible for 
absolutely dynamic data (for which any cache is merely a drain on the 
system), but would be a performance giant (in theory of course) when 
demonstrated against minimally dynamic content.  It would have all of 
the advantages of Cocoon CLI pre-generation while still allowing dynamic 
updates.

I would also think it would improve the speed of Cocoon caches even 
without TUX.  After all, the cache would never even have to check the 
filesystem.  It could just say, "My in-memory bit hasn't been flipped so 
I can assume that nothing has changed without any filesystem calls." 
 Think of the work done so that File.lastModified is called less often. 
 Wouldn't this simply be extending that thought to its logical conclusion?

Linux (2.4 kernel and higher), *BSD, Solaris, and IRIX all support fam. 
 I'm not certain about Solaris, but the others can use fam to interact 
with kernel inode monitors without any regular polling.  I seem to 
remember that Windows has a similar filesystem event callback system, 
but that may have been wishful thinking...  Anyone know for sure one way 
or the other?

Of course I saw the limitations of proposing a TUX-specific option. 
 After all, how many Cocooners out there have even given a second glance 
to TUX -- or even a first?  Aside from its use with Apache through some 
hacking sessions, this idea could also be used for Squid and other 
proxies.  Imagine having front-end proxies that simply serve content 
until they are explicitly told otherwise instead of having to constantly 
make checkup calls.

On the other end, instead of fam, the same might be used for databases 
as well.  For example, PostgreSQL could have a trigger function written 
that, when a dependent set of tables are changed, a cache expiration 
queue table could be populated with the affected entries and a linked C 
function (in the database) could fire off a "cache expired" message to 
the app server.  But of course, this requires that something like Cocoon 
have a socket listener or constantly open connection to its data input 
source.

 +----------+---------+  +------------+-----+
 | Database | DB data |  | Filesystem | fam |
 |          | minder  |  |            |     |
 |          +----+----+  |            +---+-+
 +----------+    |       +------------+   |
                 |                        |
                 +------------+-----------+
                              |
+-------------------+---------+------------+
|     Cocoon        | Cocoon Cache Manager |
|                   +-----+----------------+
|                   |     | Cache Exporter |
+-------------------+     +----+-----------+
                               |
                               |
                        +------+------+
                        |             |
          +-------------+--+       +--+-------------+
          |   Filesystem   |       | Network Socket |
          +----------------+       +-------+--------+
                                           |
             +---------+             +-----+-----+
             |   TUX   |             |   Squid   |
             +---------+             +-----------+


Anyway, something like that.  I hope my meager ASCII art skills can at 
least get across the basic premise.  What I think this entails includes 
(1) better knowledge of how the Cocoon cache current works than I have 
at this moment, (2) a cache event generation interface (implemented by a 
filesystem monitoring object that uses fam or similar technology or a 
custom database trigger and Cocoon-side socket listener), (3) a cache 
exporter interface (implemented by a filesystem serializer or 
Squid-cache--aware network component), and a heaping bowl of programming 
chutzpah.

Obviously my focus would be on getting TUX to work rather than on Squid 
as I use the former and not the latter.  What it really comes down to is 
that Squid is another box(es) to purchase and maintain and I am 
cheap/poor/bored (take your pick).  But I would just as soon make 
interfaces that weren't intimately tied to the filesystem and my 
specific goals (hence the discussion about Squid and databases).

Of course, if Sun added a filesystem callback interface to the I/O 
libraries, the native code wouldn't be necessary.  ;-)  Sure, not every 
OS supports it, but then not every platform supports threads either. 
 When not present, you emulate, right?

Any thoughts?  Am I absolutely off my rocker and in need of medication? 
 Is it something that you are already working on?  I've given my 
thoughts on advantages of a proactive cache.  What are the drawbacks to 
a proactive cache rather than a reactive one (other than simplicity and 
the fact that the reactive one is already there and works -- already got 
that)?

More info on fam: http://oss.sgi.com/projects/fam/faq.html
More info on TUX: http://www.redhat.com/docs/manuals/tux/TUX-2.2-Manual/


Thanks for your time,

Miles



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org

Re: A lurker's RT -- proactive cache

Posted by "Marcelo F. Ochoa" <mo...@ieee.org>.

Miles Elam wrote:

> <snip>
>
> On the other end, instead of fam, the same might be used for databases 
> as well.  For example, PostgreSQL could have a trigger function 
> written that, when a dependent set of tables are changed, a cache 
> expiration queue table could be populated with the affected entries 
> and a linked C function (in the database) could fire off a "cache 
> expired" message to the app server.  But of course, this requires that 
> something like Cocoon have a socket listener or constantly open 
> connection to its data input source. 

  This kind of expiration messages is already implemented into DBPrism 
External Invalidator Server which works with ESI 
(http://www.w3.org/TR/esi-invp) invalidation protocol.
   I'll upload a new patch to Cocoon's Bugzilla system with the new 
implementation of DBPrism beta sources at this time only available 
through CVS access into DBPrism SourceForge website 
(http://sourceforge.net/projects/dbprism/), look at the link browse CVS 
and the directories prism/src/org/apache/cocoon/* .
   New implementation of InMemoryServerImpl provides support for basic 
and avanced selector using regular expresion 
(http://www.dbprism.com.ar/dbprism/doc/xdocs/Server.html)
  A tipical flow of interaction using databases as source of 
invalidation messages is showed at 
(http://www.dbprism.com.ar/dbprism/doc/cms/CMS-WebCache.html), but the 
source of the invalidation messages could be any sources, including a 
simple telnet program which contact the invalidation server and sent the 
message like this:

POST /dbprism/x-dbprism-cache-invalidate HTTP/1.0
Authorization: BASIC aW52YWxpZGF0b3I6aW52YWxpZGF0b3I=
Content-Length: 190

<?xml version="1.0"?>
<INVALIDATION VERSION="WCS-1.0">
  <OBJECT>
    <BASICSELECTOR URI="/docs/samples/xsp/simple_ext.xsp"/>
    <ACTION REMOVALTTL="0"/>
  </OBJECT>
</INVALIDATION>

  The listening process is a simple XSP page 
(http://www.dbprism.com.ar/dbprism/view-source?filename=/invalidate.xsp) 
which interprets the invalidation message and contacts the invalidation 
server running inside cocoon.
   DBPrism CMS provides a set of Oracle triggers that synchronizates the 
content of the CMS with the content showed by Cocoon providing an 
average repsonse time similar to static xml pages.

>
> <snip>
>
> More info on fam: http://oss.sgi.com/projects/fam/faq.html
> More info on TUX: http://www.redhat.com/docs/manuals/tux/TUX-2.2-Manual/ 

  I'll look these urls ;)

>
>
>
> Thanks for your time,
>
> Miles
>
>
  Best regards, Marcelo.

-- 
Marcelo F. Ochoa - mochoa@ieee.org
Do you Know DB Prism? Look @ http://www.plenix.com/dbprism/
More info?
Chapter 21 of the book "Professional XML Databases" (Wrox Press 
http://www.wrox.com/)
Chapter 8 of the book "Oracle & Open Source" (O'Reilly 
http://www.oreilly.com/catalog/oracleopen/)
-----------------------------------------------
Lab. de Sistemas - Fac. de Cs. Exactas - UNICEN
Paraje Arroyo Seco - Campus Universitario
(7000) Tandil - Bs. AS. - Argentina
Te: +54-2293-444430 Fax: +54-2293-444431

---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org

Re: A lurker's RT -- proactive cache

Posted by Vadim Gritsenko <va...@verizon.net>.

Miles Elam wrote:

> First, some background: The server's a RedHat box running Cocoon w/ 
> Tomcat and TUX in front.  TUX is a kernel-level, simple web server 
> (basically static content) that shoots pages out of the system page 
> cache at alarming speeds, with appropriate hardware copies data 
> directly to network interface buffers rather than to main memory 
> first, and if it can't serve the request, passes control of the socket 
> to userland -- in my case, Tomcat/Cocoon on port 8080.  This also has 
> the advantage of allowing Tomcat and the JVM to run as "nobody" since 
> TUX can listen on 80 and pass on to non-privaledged ports above 1024.  
> But I digress...
>
> The initial purpose of this setup was to allow Tomcat/Cocoon the 
> handling of dynamic data and TUX to handle static content -- their 
> respective strong suits.  To put it in perspective, TUX can not only 
> saturate a 100baseT link on meager Pentiums but has extraordinally low 
> CPU and memory usage.  The less CPU/mem being used for static content, 
> the more Tomcat/Cocoon has for the dynamic stuff.  This is the crux of 
> my current mindset.  As I was working, I found myself wishing that 
> they could have their caches combined -- or rather that TUX could have 
> access to Tomcat/Cocoon's dynamic data cache.  This was dismissed as a 
> pipe dream very quickly; native (kernel!) code tinkering around in a 
> userland JVM instance shouldn't be considered a well-considered design.
>
> But then I thought, "Why not have Cocoon's cache output files to a 
> directory path that TUX can later serve?"  However, this ran into the 
> problem of cache expiration.  If Cocoon's cache (from now on I'll 
> refer to as a "reactive cache") were to output as a standard, 
> serialized file for TUX to serve, TUX would indeed serve it, but it 
> would no longer allow any requests to come into Tomcat/Cocoon.  Thus 
> the reactive cache's entried would never be accessed again and would 
> never expire its cache.  Thus stale data would forever more be served.


Assuming that you solve expiration issue (as written below)... Still 
this Cocoon usage scenario is not applicable for 90+ % of web sites, 
were personalization, authorization, and authentication, or just some 
application logic is required. When request is processed, Cocoon 
components such as Actions, Matchers, Selectors are executed. And if 
last two (usually) do not affect system's state, first one (ususally) 
does. Which means it is not enough to just serve contnet. It is must to 
execute action as well.

For your scenario, may be cron job with more efficient command line 
version of Cocoon will be enough. Or, look at the simplier solutions, 
like murka (http://murka.sourceforge.net/) - it is adequate for 
stateless content generation (XSLT on static XML files).

Vadim



> So would there be a way to allow Cocoon to expire cache entries 
> immediately when prerequisite files were altered, database tables 
> updated, etc.?  Not from within Java itself, no.  Not without 
> excessive polling of the filesystem and other data stores.  This would 
> so adversely affect the performance of the system so as to make any 
> gains moot.  So what about opening up that Pandora's box which 
> contains such items as "JNI code in Cocoon."
>
> ...I'll wait for the geering, insults, and epithets to die down...
>
> Taken with the SGI library fam (File Alteration Monitor) for example, 
> it would be possible to expire a cache entry as soon as a prerequisite 
> was changed.  In the case of fam, the serialized file in the document 
> tree would be deleted causing TUX to route the subsequent requests to 
> userland to be generated again.  Obviously this would be horrible for 
> absolutely dynamic data (for which any cache is merely a drain on the 
> system), but would be a performance giant (in theory of course) when 
> demonstrated against minimally dynamic content.  It would have all of 
> the advantages of Cocoon CLI pre-generation while still allowing 
> dynamic updates.
>
> I would also think it would improve the speed of Cocoon caches even 
> without TUX.  After all, the cache would never even have to check the 
> filesystem.  It could just say, "My in-memory bit hasn't been flipped 
> so I can assume that nothing has changed without any filesystem 
> calls." Think of the work done so that File.lastModified is called 
> less often. Wouldn't this simply be extending that thought to its 
> logical conclusion?
>
> Linux (2.4 kernel and higher), *BSD, Solaris, and IRIX all support 
> fam. I'm not certain about Solaris, but the others can use fam to 
> interact with kernel inode monitors without any regular polling.  I 
> seem to remember that Windows has a similar filesystem event callback 
> system, but that may have been wishful thinking...  Anyone know for 
> sure one way or the other?
>
> Of course I saw the limitations of proposing a TUX-specific option. 
> After all, how many Cocooners out there have even given a second 
> glance to TUX -- or even a first?  Aside from its use with Apache 
> through some hacking sessions, this idea could also be used for Squid 
> and other proxies.  Imagine having front-end proxies that simply serve 
> content until they are explicitly told otherwise instead of having to 
> constantly make checkup calls.
>
> On the other end, instead of fam, the same might be used for databases 
> as well.  For example, PostgreSQL could have a trigger function 
> written that, when a dependent set of tables are changed, a cache 
> expiration queue table could be populated with the affected entries 
> and a linked C function (in the database) could fire off a "cache 
> expired" message to the app server.  But of course, this requires that 
> something like Cocoon have a socket listener or constantly open 
> connection to its data input source.
>
> +----------+---------+  +------------+-----+
> | Database | DB data |  | Filesystem | fam |
> |          | minder  |  |            |     |
> |          +----+----+  |            +---+-+
> +----------+    |       +------------+   |
>                 |                        |
>                 +------------+-----------+
>                              |
> +-------------------+---------+------------+
> |     Cocoon        | Cocoon Cache Manager |
> |                   +-----+----------------+
> |                   |     | Cache Exporter |
> +-------------------+     +----+-----------+
>                               |
>                               |
>                        +------+------+
>                        |             |
>          +-------------+--+       +--+-------------+
>          |   Filesystem   |       | Network Socket |
>          +----------------+       +-------+--------+
>                                           |
>             +---------+             +-----+-----+
>             |   TUX   |             |   Squid   |
>             +---------+             +-----------+
>
>
> Anyway, something like that.  I hope my meager ASCII art skills can at 
> least get across the basic premise.  What I think this entails 
> includes (1) better knowledge of how the Cocoon cache current works 
> than I have at this moment, (2) a cache event generation interface 
> (implemented by a filesystem monitoring object that uses fam or 
> similar technology or a custom database trigger and Cocoon-side socket 
> listener), (3) a cache exporter interface (implemented by a filesystem 
> serializer or Squid-cache--aware network component), and a heaping 
> bowl of programming chutzpah.
>
> Obviously my focus would be on getting TUX to work rather than on 
> Squid as I use the former and not the latter.  What it really comes 
> down to is that Squid is another box(es) to purchase and maintain and 
> I am cheap/poor/bored (take your pick).  But I would just as soon make 
> interfaces that weren't intimately tied to the filesystem and my 
> specific goals (hence the discussion about Squid and databases).
>
> Of course, if Sun added a filesystem callback interface to the I/O 
> libraries, the native code wouldn't be necessary.  ;-)  Sure, not 
> every OS supports it, but then not every platform supports threads 
> either. When not present, you emulate, right?
>
> Any thoughts?  Am I absolutely off my rocker and in need of 
> medication? Is it something that you are already working on?  I've 
> given my thoughts on advantages of a proactive cache.  What are the 
> drawbacks to a proactive cache rather than a reactive one (other than 
> simplicity and the fact that the reactive one is already there and 
> works -- already got that)?
>
> More info on fam: http://oss.sgi.com/projects/fam/faq.html
> More info on TUX: http://www.redhat.com/docs/manuals/tux/TUX-2.2-Manual/
>
>
> Thanks for your time,
>
> Miles



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org