You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Hannes Haug <Ha...@Haug.com> on 2000/01/24 13:57:09 UTC

don't cache - validate

Hi,

I think we should drop the cache from cocoon. It
will always be slow. Cocoon should concentrate on
validating cache entries.

The only problem is that neither apache's mod_proxy
nor squid support content negotiation.

 -hh

Re: don't cache - validate

Posted by Mike Williams <mi...@o3.co.uk>.
  >>> On Mon, 24 Jan 2000 18:32:08 +0000,
  >>> "Paul" == Paul Russell <Pa...@uea.ac.uk> wrote:

  Paul> Sure, but shouldn't we be caching rather more than just the output
  Paul> (X)HTML? Translated XSPs, XSLs etc for a start.

Yes!  This makes a lot of sense where dealing with very dynamic content.

Perhaps Cocoon2 should allow the admin to specify what points in the
processing-chain are cachable?

-- 
Mike Williams

Re: don't cache - validate

Posted by Paul Russell <Pa...@uea.ac.uk>.
> Of course caching is good. But doing our own caching in java in the
> servlet engine is not that fast. Let's use a fast cache in front of
> the servlet engine. Cocoon just has to do the cache validation.

Sure, but shouldn't we be caching rather more than just the output
(X)HTML? Translated XSPs, XSLs etc for a start.

Paul

Re: don't cache - validate

Posted by Hannes Haug <Ha...@Haug.com>.
Paul Russell wrote:
> 
> Assuming I understand the intention correctly, at some undefined
> point in the future, we'll be able to generate a page from multiple
> sources. Some of these sources will be static XML documents (menus?
> 'toolbar's? god knows what else..) and some will be dynamic (god
> knows what else ;). Now, shouldn't the framework attempt to avoid
> re-parsing the statics (which for all we know could be huge). Frankly
> I tend to subscribe to the school of thought that says that all non-
> trivial caching is worthwile. We can find out how much memory we're
> using, so that's not a problem. I'm half tempted to suggest using
> a separate module as a generic object cache (although in fairness we
> may be doing that already).

Of course caching is good. But doing our own caching in java in the
servlet engine is not that fast. Let's use a fast cache in front of
the servlet engine. Cocoon just has to do the cache validation.

 -hh

Re: don't cache - validate

Posted by Paul Russell <Pa...@uea.ac.uk>.
> I think we should drop the cache from cocoon. It
> will always be slow. Cocoon should concentrate on
> validating cache entries.

Err... I don't.

Assuming I understand the intention correctly, at some undefined
point in the future, we'll be able to generate a page from multiple
sources. Some of these sources will be static XML documents (menus?
'toolbar's? god knows what else..) and some will be dynamic (god
knows what else ;). Now, shouldn't the framework attempt to avoid
re-parsing the statics (which for all we know could be huge). Frankly
I tend to subscribe to the school of thought that says that all non-
trivial caching is worthwile. We can find out how much memory we're
using, so that's not a problem. I'm half tempted to suggest using
a separate module as a generic object cache (although in fairness we
may be doing that already).

Thoughts...?


Paul

Re: don't cache - validate

Posted by Pierpaolo Fumagalli <pi...@apache.org>.
Hannes Haug wrote:
> 
> A general cache in mod_jserv or a general mod_cache for
> mod_jserv, mod_perl, php, mod_cgi, mod_fastcgi, ... ?
> 
> That's fine. But I wouldn't call it Cocoon Cache.

I don't know, it's a rough (really rough) idea right now, I'll try to
schematize it in the following days...

	Pier

-- 
--------------------------------------------------------------------
-          P              I              E              R          -
stable structure erected over water to allow the docking of seacraft
<ma...@betaversion.org>    <http://www.betaversion.org/~pier/>
--------------------------------------------------------------------
- ApacheCON Y2K: Come to the official Apache developers conference -
-------------------- <http://www.apachecon.com> --------------------

Re: don't cache - validate

Posted by Hannes Haug <Ha...@Haug.com>.
Pierpaolo Fumagalli wrote:
> 
> I'm talking something about:
> 
> 1st request:
> 
> > Client -> Web Server -> Servlet Engine -> Cocoon -> Web Server -> Client
> >                                             |
> >                                             +---------> Cocoon Cache
> 
> 2nd request:
> 
> > Client -> Web Server -+-> Client
> >                       |
> >         Cocoon Cache -+

A general cache in mod_jserv or a general mod_cache for
mod_jserv, mod_perl, php, mod_cgi, mod_fastcgi, ... ?

That's fine. But I wouldn't call it Cocoon Cache.

  -hh

Re: don't cache - validate

Posted by Pierpaolo Fumagalli <pi...@apache.org>.
Hannes Haug wrote:
> 
> Pierpaolo Fumagalli wrote:
> >
> > The cache IS essential. We're not talking about proxying issues (these
> > are totally unrelated) but about caching those things that don't need to
> > be re-generated. And in this case, caching is a must.
> 
> We could use proxies for caching.
> Of course they are not totally unrelated.
> 
> See http://www.squid-cache.org/Doc/FAQ/FAQ-20.html and
> ftp://ftp.isi.edu/in-notes/rfc2616.txt section 13 "Caching in HTTP"
> 
> Information goes more or less this way:
> file system -> servlet engine -> web server -> browser
> Why should we cache the servlet engine stuff if
> we can cache the web server or even browser stuff.

I'm talking something about:

1st request:

> Client -> Web Server -> Servlet Engine -> Cocoon -> Web Server -> Client
>                                             |
>                                             +---------> Cocoon Cache

2nd request:

> Client -> Web Server -+-> Client 
>                       |
>         Cocoon Cache -+

	Pier

-- 
--------------------------------------------------------------------
-          P              I              E              R          -
stable structure erected over water to allow the docking of seacraft
<ma...@betaversion.org>    <http://www.betaversion.org/~pier/>
--------------------------------------------------------------------
- ApacheCON Y2K: Come to the official Apache developers conference -
-------------------- <http://www.apachecon.com> --------------------

Re: don't cache - validate

Posted by Stefano Mazzocchi <st...@apache.org>.
Hannes Haug wrote:
> 
> Pierpaolo Fumagalli wrote:
> >
> > The cache IS essential. We're not talking about proxying issues (these
> > are totally unrelated) but about caching those things that don't need to
> > be re-generated. And in this case, caching is a must.
> 
> We could use proxies for caching.
> Of course they are not totally unrelated.
> 
> See http://www.squid-cache.org/Doc/FAQ/FAQ-20.html and
> ftp://ftp.isi.edu/in-notes/rfc2616.txt section 13 "Caching in HTTP"
> 
> Information goes more or less this way:
> file system -> servlet engine -> web server -> browser
> Why should we cache the servlet engine stuff if
> we can cache the web server or even browser stuff.

Because only Cocoon knows (hell, not even Cocoon sometimes! see external
entities) if the cached data is valid.

I'm more than willing to add http headers to improve external caching,
but this does not reduce the needs for an inside cache at all.

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------
 Come to the first official Apache Software Foundation Conference!  
------------------------- http://ApacheCon.Com ---------------------



Re: don't cache - validate

Posted by Hannes Haug <Ha...@Haug.com>.
Pierpaolo Fumagalli wrote:
> 
> The cache IS essential. We're not talking about proxying issues (these
> are totally unrelated) but about caching those things that don't need to
> be re-generated. And in this case, caching is a must.

We could use proxies for caching.
Of course they are not totally unrelated.

See http://www.squid-cache.org/Doc/FAQ/FAQ-20.html and
ftp://ftp.isi.edu/in-notes/rfc2616.txt section 13 "Caching in HTTP"

Information goes more or less this way:
file system -> servlet engine -> web server -> browser
Why should we cache the servlet engine stuff if
we can cache the web server or even browser stuff.

 -hh

Re: don't cache - validate

Posted by Pierpaolo Fumagalli <pi...@apache.org>.
Hannes Haug wrote:
> 
> Hi,
> 
> I think we should drop the cache from cocoon. It
> will always be slow. Cocoon should concentrate on
> validating cache entries.

??????????????????????????????????????????????????

> The only problem is that neither apache's mod_proxy
> nor squid support content negotiation.

The cache IS essential. We're not talking about proxying issues (these
are totally unrelated) but about caching those things that don't need to
be re-generated. And in this case, caching is a must.

	Pier

-- 
--------------------------------------------------------------------
-          P              I              E              R          -
stable structure erected over water to allow the docking of seacraft
<ma...@betaversion.org>    <http://www.betaversion.org/~pier/>
--------------------------------------------------------------------
- ApacheCON Y2K: Come to the official Apache developers conference -
-------------------- <http://www.apachecon.com> --------------------

Re: don't cache - validate

Posted by Wong Kok Wai <wo...@pacific.net.sg>.
ObjectStore the OODB from Object Design (http://www.odi.com)?

Pierpaolo Fumagalli wrote:

>
> I think someone already invented it, the name is (again, ufff) Stefano
> Mazzocchi and the thing is called ObjectStore, part of Avalon
> (java.apache.org), I believe :) :) :)
>
>         Pier
>


Re: don't cache - validate

Posted by Pierpaolo Fumagalli <pi...@apache.org>.
Paul Russell wrote:
> 
> On Mon, Jan 24, 2000 at 02:53:42PM -0800, Pierpaolo Fumagalli wrote:
> > I think someone already invented it, the name is (again, ufff) Stefano
> > Mazzocchi and the thing is called ObjectStore, part of Avalon
> > (java.apache.org), I believe :) :) :)
> 
> heh! Repeat after me - 1, 2, 3, d'oh! Okay, that's good news. Is there
> any intention for cocoon to use it? (or does it already, and I've missed
> that?)

Cocoon 2.0, object store and others :) Get the interfaces out of CVS!

	Pier

-- 
--------------------------------------------------------------------
-          P              I              E              R          -
stable structure erected over water to allow the docking of seacraft
<ma...@betaversion.org>    <http://www.betaversion.org/~pier/>
--------------------------------------------------------------------
- ApacheCON Y2K: Come to the official Apache developers conference -
-------------------- <http://www.apachecon.com> --------------------

Re: don't cache - validate

Posted by Paul Russell <Pa...@uea.ac.uk>.
On Mon, Jan 24, 2000 at 02:53:42PM -0800, Pierpaolo Fumagalli wrote:
> I think someone already invented it, the name is (again, ufff) Stefano
> Mazzocchi and the thing is called ObjectStore, part of Avalon
> (java.apache.org), I believe :) :) :)

heh! Repeat after me - 1, 2, 3, d'oh! Okay, that's good news. Is there
any intention for cocoon to use it? (or does it already, and I've missed
that?)

Paul

Re: don't cache - validate

Posted by Pierpaolo Fumagalli <pi...@apache.org>.
Paul Russell wrote:
> 
> I *still* stand by the fact that we need a cache for the actual objects
> in the content generation phase. Java is not *that* slow (rivals C++
> with a decent jvm from what I've seen - and I'm a C++ programmer!)
> There is no theoretical reason why java should be any slower than C at
> all - it could potentially be faster in fact because you can do all
> sorts of runtime optimisation with the jvm (in a similar way to the
> transmeta processor's optimiser).

Yep.. Agreed..

> Perhaps it's about time that someone wrote a generic object caching
> architecture - I suspect we're not the only ones who could do with
> one.

I think someone already invented it, the name is (again, ufff) Stefano
Mazzocchi and the thing is called ObjectStore, part of Avalon
(java.apache.org), I believe :) :) :)

	Pier

-- 
--------------------------------------------------------------------
-          P              I              E              R          -
stable structure erected over water to allow the docking of seacraft
<ma...@betaversion.org>    <http://www.betaversion.org/~pier/>
--------------------------------------------------------------------
- ApacheCON Y2K: Come to the official Apache developers conference -
-------------------- <http://www.apachecon.com> --------------------

Re: don't cache - validate

Posted by Hannes Haug <Ha...@Haug.com>.
Paul Russell wrote:
> 
> On Tue, Jan 25, 2000 at 12:01:17AM +0100, Hannes Haug wrote:
> > The problem is not the efficiency of compiled code. It's a problem
> > of the libraries. And java's io api sucks in some areas. Look at
> > http://linux.wiw.org/doc/man/man.cgi?man=sendfile&sec=0
> > Can something like this be done in pure java?
> 
> No, you're right, given the speed that sendfile can operate it would
> be difficult to compete using pure java, [...]
     ^^^^^^^^^

impossible even to come close

 -hh

Re: don't cache - validate

Posted by Paul Russell <Pa...@uea.ac.uk>.
On Tue, Jan 25, 2000 at 12:01:17AM +0100, Hannes Haug wrote:
> The problem is not the efficiency of compiled code. It's a problem
> of the libraries. And java's io api sucks in some areas. Look at
> http://linux.wiw.org/doc/man/man.cgi?man=sendfile&sec=0
> Can something like this be done in pure java?

No, you're right, given the speed that sendfile can operate it would
be difficult to compete using pure java, because sendfile uses underlying
features of the operating system to speed itself up (it is implemented
as part of the kernel, and works at the inode level). Having said that,
I still think java is pleanty fast enough to make caching compiled
objects and output feasible and worthwhile.


Paul

Re: don't cache - validate

Posted by Hannes Haug <Ha...@Haug.com>.
Paul Russell wrote:
> 
> I *still* stand by the fact that we need a cache for the actual objects
> in the content generation phase.

That's ok.

> Java is not *that* slow (rivals C++ with a decent jvm from what I've
> seen - and I'm a C++ programmer!) There is no theoretical reason why
> java should be any slower than C at all - it could potentially be
> faster in fact because you can do all sorts of runtime optimisation
> with the jvm (in a similar way to the transmeta processor's optimiser).

The problem is not the efficiency of compiled code. It's a problem
of the libraries. And java's io api sucks in some areas. Look at
http://linux.wiw.org/doc/man/man.cgi?man=sendfile&sec=0
Can something like this be done in pure java?

 -hh

Re: don't cache - validate

Posted by Paul Russell <Pa...@uea.ac.uk>.
On Mon, Jan 24, 2000 at 11:17:17PM +0100, Hannes Haug wrote:
> Stefano Mazzocchi wrote:
> > Pier told me about an idea about a mod_cocoon that would do a cache
> > frontend talking directly between apache and Cocoon, but I think this
> > should be done at a servlet engine level, not at a servlet level.
> Perhaps it's better to extend mod_proxy than to reinvent the wheel.

I *still* stand by the fact that we need a cache for the actual objects
in the content generation phase. Java is not *that* slow (rivals C++
with a decent jvm from what I've seen - and I'm a C++ programmer!)
There is no theoretical reason why java should be any slower than C at
all - it could potentially be faster in fact because you can do all
sorts of runtime optimisation with the jvm (in a similar way to the
transmeta processor's optimiser).

Perhaps it's about time that someone wrote a generic object caching
architecture - I suspect we're not the only ones who could do with
one.


Paul

Re: don't cache - validate

Posted by Hannes Haug <Ha...@Haug.com>.
Stefano Mazzocchi wrote:
> 
> Pier told me about an idea about a mod_cocoon that would do a cache
> frontend talking directly between apache and Cocoon, but I think this
> should be done at a servlet engine level, not at a servlet level.

mod_cocoon is almost there, it's called mod_proxy.
It's not full http 1.1, it doesn't support content,
but it works for the mod_perl guys. See
http://perl.apache.org/guide/strategy.html#Adding_a_Proxy_Server_in_http_Ac

Perhaps it's better to extend mod_proxy than to reinvent the wheel.

 -hh

Re: don't cache - validate

Posted by Hannes Haug <Ha...@Haug.com>.
Some more links

mod_proxy stuff in the modperl archive:
http://forum.swarthmore.edu/search/epi_results.html?
textsearch=mod_proxy&ctrlfile=epigone/modperl.ctrl

Ilya Obshadko has a patch that fixes some of mod_proxy's
problems for dynamic content cacheing:
http://forum.swarthmore.edu/epigone/modperl/plimgehza/13833.000121@zhurnal.ru

  -hh

Re: don't cache - validate

Posted by Pierpaolo Fumagalli <pi...@apache.org>.
Stefano Mazzocchi wrote:
> 
> Pier told me about an idea about a mod_cocoon that would do a cache
> frontend talking directly between apache and Cocoon, but I think this
> should be done at a servlet engine level, not at a servlet level.

As I already wrote... I'm working on the rough sketch for a dynamic HTTP
caching mechanism, that could be integrated in a servlet (it could be
integrated in the Servlet Engine also, but I'm waiting to see the
evolutionary path of tomcat before hacking into the sources).
Basically the idea is to have a request monitor that intercepts and
builds a hash of the request, if this is equal to the previously
generated ones, then, not even touching or executing the logic, the
results are gathered from the cache.
This mechanism can be built in a servlet (as it's now), in the servlet
engine (waiting for tomcat), or in the web server (mod_cocoon, but
preferably mod_dyncache).
But that is still not connected to Cocoon...

My 0.02$

	Pier

-- 
--------------------------------------------------------------------
-          P              I              E              R          -
stable structure erected over water to allow the docking of seacraft
<ma...@betaversion.org>    <http://www.betaversion.org/~pier/>
--------------------------------------------------------------------
- ApacheCON Y2K: Come to the official Apache developers conference -
-------------------- <http://www.apachecon.com> --------------------

Re: don't cache - validate

Posted by Stefano Mazzocchi <st...@apache.org>.
Hannes Haug wrote:
> 
> Hi,
> 
> I think we should drop the cache from cocoon. It
> will always be slow. Cocoon should concentrate on
> validating cache entries.
> 
> The only problem is that neither apache's mod_proxy
> nor squid support content negotiation.

Hannes, 

you're right saying that Cocoon will always be slower than internal
server caching, no doubt about that, and I'd be very happy to delegate
the final caching to something else, really.

I'm wide open to suggestions here.

Pier told me about an idea about a mod_cocoon that would do a cache
frontend talking directly between apache and Cocoon, but I think this
should be done at a servlet engine level, not at a servlet level.

Comments?

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------
 Come to the first official Apache Software Foundation Conference!  
------------------------- http://ApacheCon.Com ---------------------