You are viewing a plain text version of this content. The canonical link for it is here.

Posted to axkit-dev@xml.apache.org by Matt Sergeant <ma...@sergeant.org> on 2003/04/12 20:31:05 UTC

Cache as a Language (pipeline) module

I've been thinking. Maybe the cache should just be another pipeline 
module, rather some "magic" thing that tries very very hard to figure 
out if it should be applied or not (and the "very very" part makes it 
slow).

This would probably be a 2.0 thing.

Am I crazy?

Re: Cache as a Language (pipeline) module

Posted by Robin Berjon <ro...@knowscape.com>.

Matt Sergeant wrote:
> I've been thinking. Maybe the cache should just be another pipeline 
> module, rather some "magic" thing that tries very very hard to figure 
> out if it should be applied or not (and the "very very" part makes it 
> slow).
> 
> This would probably be a 2.0 thing.
> 
> Am I crazy?

+1 from me. The pipeline manager should really be some sort of agnostic kernel, 
almost dumb, and everything smart should be a module called by it. I say 
"should" not as in "that's a cleaner design according to textbooks" but as in 
"it seems that part of our complexity comes from making that bit too smart".

--r

Re: Cache as a Language (pipeline) module

Posted by Pavel Penchev <pa...@tkzs.org>.

>From a user perspective (as I'm most of all AxKit user) I think it will be
great. It will make life much easier for people writing cache modules as to
have the cache as a separate part of the pipline is a logical thing.

Just a question : is there a way this to be a 1.7 thing? My knowledge of
AxKit internals is limited but isn't AxKit.pm the only place where cache
checks are done? If this is the case backward compatibility will be achieved
relatively easy.

Pavel


----- Original Message -----
From: "Matt Sergeant" <ma...@sergeant.org>
To: <ax...@xml.apache.org>
Sent: Saturday, April 12, 2003 9:31 PM
Subject: Cache as a Language (pipeline) module


> I've been thinking. Maybe the cache should just be another pipeline
> module, rather some "magic" thing that tries very very hard to figure
> out if it should be applied or not (and the "very very" part makes it
> slow).
>
> This would probably be a 2.0 thing.
>
> Am I crazy?
>
>

Re: Cache as a Language (pipeline) module

Posted by Robin Berjon <ro...@knowscape.com>.

Chris Leishman wrote:
> Hmm....didn't catch that one in my incr. cache patches - I think I've 
> probably cut it out.  Have to add it back later I guess.
> 
> That isn't a very good general solution though.....eg. it wouldn't work 
> with incr. caching.

It could be made to with a 'passthru_next' (or even passthrough...), one could 
then just interleave cache modules with processing modules and get incremental 
caching.

--r

Re: Cache as a Language (pipeline) module

Posted by Robin Berjon <ro...@knowscape.com>.

Chris Leishman wrote:
> On Monday, April 14, 2003, at 07:02 PM, Matt Sergeant wrote:
>> This is not totally coherent yet, but hopefully its getting closer.
> 
> I think it's a nice thought, but there's going to be a lot of caveats.
> 
> - If the cache "stage" is towards the end of the pipeline, it's still 
> going to have to check all the 'dependancies' before it so the overhead 
> is the same.

If the same operation with the same features has the same overhead, then that's 
not much of a caveat :)

> If it's towards the start then the cache isn't going to be 
> very effective since the later stages will always have to be run.

As above.

> - Unless there aren't multiple cache "stages" then there's going to be 
> the same issue of forcing a re-run of the entire pipeline if anything 
> changes anywhere.  If there are multiple stages to avoid this, then the 
> overhead is similar to that of incremental caching.

As above.

> - Users are going to have to have detailed knowledge of how elements in 
> the pipeline work in order to place cache points appropriately, and to 
> be able to specify what affects the caching.
> 
> I'll be blunt - implementing it in this way would cause a lot of 
> confusion for users and would most likely result in a lot of people 
> doing it the wrong way.  You'd be forcing the users to deal with 
> something that can be handle internally for the most part.  The only 
> benefit would be the ability to optimize for specific scenarios (what I 
> would consider as rather premature optimization).

That's not very blunt :)

Since we have to maintain some backwards compatibility, I'd add to Matt's 
description that configuring a pipeline with no cache step means a cache step 
with the present behaviour is magically inserted. Good magic, users don't see 
the difference.

And for those that will want to cache their way, then it'll be a little more 
complex but that's to be expected, after all they're doing something extra. I 
know I've had quite a number of XSPs that could have been cached on query string 
+ a touch file touched every time a db was updated. Those could have benefitted 
from smarter caching, and I wouldn't call that premature optimisation.

--r

Re: Cache as a Language (pipeline) module

Posted by Chris Leishman <ch...@leishman.org>.

On Monday, April 14, 2003, at 07:02 PM, Matt Sergeant wrote:
<snip>
> OK, at the moment we cache at the end, prior to delivery. The cache 
> has to
> try very hard to figure out if it should be applied or not (by checking
> mtime on every single resource involved in the transaction). This works
> well for direct pipelines of XSLT -> XSLT -> XSLT, but sucks for 
> anything
> else.
>
> For anything involving XSP, it means you have no control. Caching is 
> off.
>
> With incremental caching you get the magic of the current cache
> implementation, but happening at all stages in the pipeline. That can 
> only
> be slower.
>
> What I'm trying to say is that I think I designed the caching system
> wrong, or at least "too smart". Instead I'd prefer the user to decide 
> when
> the cache gets used (witness also the confusion about the cache
> being used despite changes in querystring). The sensible place for 
> this to
> occur is another "stage" in the pipeline. So when you design your 
> pipeline
> you choose where the caching occurs (hence no need for the incremental
> caching stuff as the cache becomes a manual thing). You also choose 
> what
> things (other than files) affect the cache - like TTL, querystring, 
> POST
> params, etc.
>
> This is not totally coherent yet, but hopefully its getting closer.

I think it's a nice thought, but there's going to be a lot of caveats.

- If the cache "stage" is towards the end of the pipeline, it's still 
going to have to check all the 'dependancies' before it so the overhead 
is the same.  If it's towards the start then the cache isn't going to 
be very effective since the later stages will always have to be run.

- Unless there aren't multiple cache "stages" then there's going to be 
the same issue of forcing a re-run of the entire pipeline if anything 
changes anywhere.  If there are multiple stages to avoid this, then the 
overhead is similar to that of incremental caching.

- Users are going to have to have detailed knowledge of how elements in 
the pipeline work in order to place cache points appropriately, and to 
be able to specify what affects the caching.

I'll be blunt - implementing it in this way would cause a lot of 
confusion for users and would most likely result in a lot of people 
doing it the wrong way.  You'd be forcing the users to deal with 
something that can be handle internally for the most part.  The only 
benefit would be the ability to optimize for specific scenarios (what I 
would consider as rather premature optimization).

Ideally the caching should be done the same way cocoon does it - as 
part of each "stage" element.  Each stage is responsible for it's own 
dependancy checking and caching.  Each "stage" can then take options 
from the user to manipulate the way it performs the caching.  In this 
situation it is still possible to optimize the caching by manipulating 
each "stage" and by also coding the pipeline to be able to calculate 
those caching points that are redundant (eg "stages" that are followed 
by another "stage" that depends only upon it's input).

Regards,
Chris

Re: Cache as a Language (pipeline) module

Posted by Matt Sergeant <ma...@sergeant.org>.

On Mon, 14 Apr 2003, Chris Leishman wrote:

>
> On Monday, April 14, 2003, at 10:07 AM, Matt Sergeant wrote:
>
> > On Sun, 13 Apr 2003, Chris Leishman wrote:
> >
> >> That isn't a very good general solution though.....eg. it wouldn't
> >> work
> >> with incr. caching.
> >
> > With Cache as a pipeline module (aka Language module) you wouldn't need
> > the incr caching stuff.
>
> I think you need to explain the idea in more depth.....

OK, at the moment we cache at the end, prior to delivery. The cache has to
try very hard to figure out if it should be applied or not (by checking
mtime on every single resource involved in the transaction). This works
well for direct pipelines of XSLT -> XSLT -> XSLT, but sucks for anything
else.

For anything involving XSP, it means you have no control. Caching is off.

With incremental caching you get the magic of the current cache
implementation, but happening at all stages in the pipeline. That can only
be slower.

What I'm trying to say is that I think I designed the caching system
wrong, or at least "too smart". Instead I'd prefer the user to decide when
the cache gets used (witness also the confusion about the cache
being used despite changes in querystring). The sensible place for this to
occur is another "stage" in the pipeline. So when you design your pipeline
you choose where the caching occurs (hence no need for the incremental
caching stuff as the cache becomes a manual thing). You also choose what
things (other than files) affect the cache - like TTL, querystring, POST
params, etc.

This is not totally coherent yet, but hopefully its getting closer.

-- 
<!-- Matt -->
<:->get a SMart net</:->
Spam trap - do not mail: spam-sig@spamtrap.messagelabs.com

Re: Cache as a Language (pipeline) module

Posted by Chris Leishman <ch...@leishman.org>.

On Monday, April 14, 2003, at 10:07 AM, Matt Sergeant wrote:

> On Sun, 13 Apr 2003, Chris Leishman wrote:
>
>> That isn't a very good general solution though.....eg. it wouldn't 
>> work
>> with incr. caching.
>
> With Cache as a pipeline module (aka Language module) you wouldn't need
> the incr caching stuff.

I think you need to explain the idea in more depth.....

Regards,
Chris

Re: Cache as a Language (pipeline) module

Posted by Matt Sergeant <ma...@sergeant.org>.

On Sun, 13 Apr 2003, Chris Leishman wrote:

> That isn't a very good general solution though.....eg. it wouldn't work
> with incr. caching.

With Cache as a pipeline module (aka Language module) you wouldn't need
the incr caching stuff.

-- 
<!-- Matt -->
<:->get a SMart net</:->
Spam trap - do not mail: spam-sig@spamtrap.messagelabs.com

Re: Cache as a Language (pipeline) module

Posted by Chris Leishman <ch...@leishman.org>.

On Sunday, April 13, 2003, at 11:10 AM, Matt Sergeant wrote:

> On Sat, 12 Apr 2003, Chris Leishman wrote:
>
>>
>> On Saturday, April 12, 2003, at 09:31 PM, Matt Sergeant wrote:
>>
>>> I've been thinking. Maybe the cache should just be another pipeline
>>> module, rather some "magic" thing that tries very very hard to figure
>>> out if it should be applied or not (and the "very very" part makes it
>>> slow).
>>
>> Only catch is that the cache has to control whether other pipeline
>> modules get run...so there would have to be some sort of skip logic...
>
> There already is - $r->pnotes('passthru' => 1) at runtime makes this 
> the
> last in the chain.

Oh?

Hmm....didn't catch that one in my incr. cache patches - I think I've 
probably cut it out.  Have to add it back later I guess.

That isn't a very good general solution though.....eg. it wouldn't work 
with incr. caching.

Regards,
Chris

Re: Cache as a Language (pipeline) module

Posted by Matt Sergeant <ma...@sergeant.org>.

On Sat, 12 Apr 2003, Chris Leishman wrote:

>
> On Saturday, April 12, 2003, at 09:31 PM, Matt Sergeant wrote:
>
> > I've been thinking. Maybe the cache should just be another pipeline
> > module, rather some "magic" thing that tries very very hard to figure
> > out if it should be applied or not (and the "very very" part makes it
> > slow).
>
> Only catch is that the cache has to control whether other pipeline
> modules get run...so there would have to be some sort of skip logic...

There already is - $r->pnotes('passthru' => 1) at runtime makes this the
last in the chain.

-- 
<!-- Matt -->
<:->get a SMart net</:->
Spam trap - do not mail: spam-sig@spamtrap.messagelabs.com

Re: Cache as a Language (pipeline) module

Posted by Chris Leishman <ch...@leishman.org>.

On Saturday, April 12, 2003, at 09:31 PM, Matt Sergeant wrote:

> I've been thinking. Maybe the cache should just be another pipeline 
> module, rather some "magic" thing that tries very very hard to figure 
> out if it should be applied or not (and the "very very" part makes it 
> slow).

Only catch is that the cache has to control whether other pipeline 
modules get run...so there would have to be some sort of skip logic...

Regards,
Chris