You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Unico Hommes <un...@hippo.nl> on 2004/03/02 12:43:30 UTC
Event caching and CachedSource
Hi gang :-)
A drawback I have been running into lately with eventcache mechanism is
that it lacks the ability to remove heavy processing from the critical
path. An event will simply remove a set of cached pipelines from the
cache completely. Making the subsequent request for such a pipeline
potentialy very slow. In applications where isolation is not a
requirement this is an unnecessary drawback.
I am looking at the excellent CachedSource stuff that is in the
scratchpad area ATM and am wondering how it fits together with the
eventcache stuff. One thing I am looking into right now is to write an
EventAware Refresher implementation.
For those unfamiliar with CachedSource, it is a Source wrapper that can
cache a its delegate. Refreshing can be done either synchronously or
asynchronously but currently only based upon a specified time-out. What
I'd like to do is generalize this a bit in order to add the ability to
externally trigger invalidation.
For this however I think a modification to the Refresher interface is
needed.
Instead of:
Refresher {
refresh(key,uri,timeout);
periodicallyRefresh(key,uri,timeout);
}
I'd like to remove timeout semantics from the interface:
Refresher {
refresh(key,uri,params);
}
I don't think there is currently a reason for there being two the
separate methods. So I think we can safely combine them into one. But I
guess I am looking at Carsten for confirmation... :-)
Cheers,
Unico
Re: Event caching and CachedSource
Posted by Unico Hommes <un...@hippo.nl>.
Geoff Howard wrote:
> Unico Hommes wrote:
>
>> Geoff Howard wrote:
>>
>>> Unico Hommes wrote:
>>>
>>>> Hi gang :-)
>>>>
>>>> A drawback I have been running into lately with eventcache
>>>> mechanism is that it lacks the ability to remove heavy processing
>>>> from the critical path. An event will simply remove a set of cached
>>>> pipelines from the cache completely. Making the subsequent request
>>>> for such a pipeline potentialy very slow. In applications where
>>>> isolation is not a requirement this is an unnecessary drawback.
>>>
>>>
>>>
>>> Below sounds interesting and good but I haven't understood how event
>>> cache is related. AFAICS the only difference with eventcache and
>>> the other validity types is that for the others an invalid response
>>> is found in cache, but not used because it is found invalid after
>>> retrieval, but the event cache removes the entry at invalidation
>>> time since it knows it will never be useful. Both cases mean that
>>> the next person to request that resource will have to wait for the
>>> full generation. Maybe because I've only glanced at the refresher
>>> stuff?
>>>
>> I guess you are right that at the Cache level nothing really changes.
>> I overlooked that fact. I will do some more research on what is
>> required to accomplish that in the case of the Refresher, but my idea
>> was that the cached response would be served until a newly generated
>> one could replace the stale one. Since the Refresher talks to the
>> Cache directly, given the correct Validity strategy it can exercise
>> full control over it.
>
>
>
> So, stale entries are served until they can be regenerated? I've
> looked for this in the past (someone called it the "I'm Sorry" pattern
> :) ) and at the time thought it might be better implemented by a
> pluggable strategy at the pipeline execution level. Currently we have:
>
> - Assemble Pipeline
> - Gather key from Pipeline
> - Check cache for key
> - If object for key found, check its validity
> - If valid, serve the cached response
> - Else, execute pipeline and serve it.
>
> the cache point pipeline, and the non-caching pipeline are other
> implementations of different strategies, but are accomplished by
> inheritance instead of composing a Strategy. I haven't ever thought
> it through carefully but it seems like making those last 5 steps (as a
> group) a pluggable strategy would allow things like this "I'm Sorry"
> pattern, as well as more powerful concepts like Stefano's proposed
> adaptive cache. Just raw thoughts at this point...
I see two things at stake in my use case. The strategy pattern as you
call it (regular,inverted,'i'm sorry', adaptive,etc.) and the
granularity of objects in the cache. In my case it is very inefficient
to only cache complete pipelines and I need to have multiple levels of
caching to optimize performance: besides caching the complete pipeline,
also the individual sources that compise a traversable generation.
I am not sure I understand what you mean with 'pluggable strategy'.
Isn't this what we already have with the different pipeline implementations?
Unico
Re: Event caching and CachedSource
Posted by Geoff Howard <co...@leverageweb.com>.
Unico Hommes wrote:
> Geoff Howard wrote:
>
>> Unico Hommes wrote:
>>
>>> Hi gang :-)
>>>
>>> A drawback I have been running into lately with eventcache mechanism
>>> is that it lacks the ability to remove heavy processing from the
>>> critical path. An event will simply remove a set of cached pipelines
>>> from the cache completely. Making the subsequent request for such a
>>> pipeline potentialy very slow. In applications where isolation is
>>> not a requirement this is an unnecessary drawback.
>>
>>
>> Below sounds interesting and good but I haven't understood how event
>> cache is related. AFAICS the only difference with eventcache and the
>> other validity types is that for the others an invalid response is
>> found in cache, but not used because it is found invalid after
>> retrieval, but the event cache removes the entry at invalidation time
>> since it knows it will never be useful. Both cases mean that the
>> next person to request that resource will have to wait for the full
>> generation. Maybe because I've only glanced at the refresher stuff?
>>
> I guess you are right that at the Cache level nothing really changes.
> I overlooked that fact. I will do some more research on what is
> required to accomplish that in the case of the Refresher, but my idea
> was that the cached response would be served until a newly generated
> one could replace the stale one. Since the Refresher talks to the
> Cache directly, given the correct Validity strategy it can exercise
> full control over it.
So, stale entries are served until they can be regenerated? I've looked
for this in the past (someone called it the "I'm Sorry" pattern :) ) and
at the time thought it might be better implemented by a pluggable
strategy at the pipeline execution level. Currently we have:
- Assemble Pipeline
- Gather key from Pipeline
- Check cache for key
- If object for key found, check its validity
- If valid, serve the cached response
- Else, execute pipeline and serve it.
the cache point pipeline, and the non-caching pipeline are other
implementations of different strategies, but are accomplished by
inheritance instead of composing a Strategy. I haven't ever thought it
through carefully but it seems like making those last 5 steps (as a
group) a pluggable strategy would allow things like this "I'm Sorry"
pattern, as well as more powerful concepts like Stefano's proposed
adaptive cache. Just raw thoughts at this point...
>> Bottom line for me at moment is: do you foresee a need to modify the
>> eventcache API to accomodate this need? I'm getting ready to start a
>> discussion on changing the eventcache unstable status -- should I
>> hold off?
>>
> I don't think my current work will influence the eventcache API
> directly. Although I am not sure if
> the eventcache stuff can be considered stable enough. I still have
> some doubts about the ease of use of parts of it especially the way
> events are associated with cached objects. But lets discuss that
> separately.
Ah, good. Ok, I'll pick up on another thread.
Geoff
RE: Event caching and CachedSource
Posted by Carsten Ziegeler <cz...@s-und-n.de>.
Unico Hommes wrote:
> > BTW, how does CachedSource accomplish something different from the
> > caching point pipeline (which seems to accomplish more, though I've
> > never used it).
> >
> I never used it either. So I really don't know. Perhaps
> someone else could comment on this?
>
The CachedSource caches a source :) whereas the caching point pipeline
caches part of a pipeline. They could be used in combination but have
different purposes.
The caching point pipeline can cache the beginning of a pipeline upto
the point, but this only works if all components in the pipeline
support the caching; if not, nothing is cached.
Now, imagine that you have a database source that fetches content
from a slow database (or cms). The usual caching alg. tries to
look if the source read by the generator has changed since the last call.
In the case of the database source this is not possible and the
pipeline is never cached.
With the cached source the content fetched from the db is cached,
reducing the requests to the back-end system and the generator
can use this to test if the source has changed, allowing the
pipeline (or a part of it) to be cached as well.
HTH
Carsten
Re: Event caching and CachedSource
Posted by Unico Hommes <un...@hippo.nl>.
Geoff Howard wrote:
> Unico Hommes wrote:
>
>> Hi gang :-)
>>
>> A drawback I have been running into lately with eventcache mechanism
>> is that it lacks the ability to remove heavy processing from the
>> critical path. An event will simply remove a set of cached pipelines
>> from the cache completely. Making the subsequent request for such a
>> pipeline potentialy very slow. In applications where isolation is not
>> a requirement this is an unnecessary drawback.
>
>
>
> Below sounds interesting and good but I haven't understood how event
> cache is related. AFAICS the only difference with eventcache and the
> other validity types is that for the others an invalid response is
> found in cache, but not used because it is found invalid after
> retrieval, but the event cache removes the entry at invalidation time
> since it knows it will never be useful. Both cases mean that the next
> person to request that resource will have to wait for the full
> generation. Maybe because I've only glanced at the refresher stuff?
>
I guess you are right that at the Cache level nothing really changes. I
overlooked that fact. I will do some more research on what is required
to accomplish that in the case of the Refresher, but my idea was that
the cached response would be served until a newly generated one could
replace the stale one. Since the Refresher talks to the Cache directly,
given the correct Validity strategy it can exercise full control over it.
> Bottom line for me at moment is: do you foresee a need to modify the
> eventcache API to accomodate this need? I'm getting ready to start a
> discussion on changing the eventcache unstable status -- should I hold
> off?
>
I don't think my current work will influence the eventcache API
directly. Although I am not sure if
the eventcache stuff can be considered stable enough. I still have some
doubts about the ease of use of parts of it especially the way events
are associated with cached objects. But lets discuss that separately.
>> I am looking at the excellent CachedSource stuff that is in the
>> scratchpad area ATM and am wondering how it fits together with the
>> eventcache stuff. One thing I am looking into right now is to write
>> an EventAware Refresher implementation.
>>
>> For those unfamiliar with CachedSource, it is a Source wrapper that
>> can cache a its delegate. Refreshing can be done either synchronously
>> or asynchronously but currently only based upon a specified time-out.
>> What I'd like to do is generalize this a bit in order to add the
>> ability to externally trigger invalidation.
>>
>> For this however I think a modification to the Refresher interface is
>> needed.
>
>
>
> BTW, how does CachedSource accomplish something different from the
> caching point pipeline (which seems to accomplish more, though I've
> never used it).
>
I never used it either. So I really don't know. Perhaps someone else
could comment on this?
Cheers,
Unico
Re: Event caching and CachedSource
Posted by Geoff Howard <co...@leverageweb.com>.
Unico Hommes wrote:
> Hi gang :-)
>
> A drawback I have been running into lately with eventcache mechanism
> is that it lacks the ability to remove heavy processing from the
> critical path. An event will simply remove a set of cached pipelines
> from the cache completely. Making the subsequent request for such a
> pipeline potentialy very slow. In applications where isolation is not
> a requirement this is an unnecessary drawback.
Below sounds interesting and good but I haven't understood how event
cache is related. AFAICS the only difference with eventcache and the
other validity types is that for the others an invalid response is found
in cache, but not used because it is found invalid after retrieval, but
the event cache removes the entry at invalidation time since it knows it
will never be useful. Both cases mean that the next person to request
that resource will have to wait for the full generation. Maybe because
I've only glanced at the refresher stuff?
Bottom line for me at moment is: do you foresee a need to modify the
eventcache API to accomodate this need? I'm getting ready to start a
discussion on changing the eventcache unstable status -- should I hold off?
> I am looking at the excellent CachedSource stuff that is in the
> scratchpad area ATM and am wondering how it fits together with the
> eventcache stuff. One thing I am looking into right now is to write an
> EventAware Refresher implementation.
>
> For those unfamiliar with CachedSource, it is a Source wrapper that
> can cache a its delegate. Refreshing can be done either synchronously
> or asynchronously but currently only based upon a specified time-out.
> What I'd like to do is generalize this a bit in order to add the
> ability to externally trigger invalidation.
>
> For this however I think a modification to the Refresher interface is
> needed.
BTW, how does CachedSource accomplish something different from the
caching point pipeline (which seems to accomplish more, though I've
never used it).
Geoff
Re: Event caching and CachedSource
Posted by Unico Hommes <un...@hippo.nl>.
Vadim Gritsenko wrote:
> Unico Hommes wrote:
>
>> Carsten Ziegeler wrote:
>>
>>> Unico Hommes wrote:
>>>
>>>> I'd also like to change the protocol URL a little bit. Since the
>>>> timeout parameter will only be applicable to the delay refresher
>>>> implementation and not to the event aware one I think it would be
>>>> better to specify it with a query parameter instead.
>>>>
>>>> Current syntax: cache://60@main@http://www.apache.org/
>>>> Proposed syntax:
>>>> cache:http://www.apache.org/?cache-expires=60&cache-name=main
>>>>
>>>> The protocol:subprotocol syntax is also more in line with well
>>>> established conventions such as in jdbc for instance.
>>>>
>>>> Let me know if you have any objections or comments.
>>>>
>>>
>>>
>>> No objections from me, but the parameters must have clear names,
>>> which means there shouldn't be a conflict. Imagine:
>>>
>>> cache:http://www.apache.org/?cache-expires=60&cache-name=main&expires=500
>>>
>>>
>>> (Dumb example, I know) But what I mean is that the real url/source
>>> could also have parameters and it must be clear which ones are
>>> for the cache source and which ones are for the real source,
>>> so perhaps something like "cocoon-cache..." or perhaps better
>>> using invalid names like "cocoon:cache=60"?
>>>
>> Yeah I had been thinkin along the same lines. I like the colon
>> notation because it resembles familiar namespace notation. So I'll go
>> with your latter suggestion.
>
>
>
> Does it make sense to have it both ways? So, say, you can use either:
> cache:main:60@http://www.apache.org/
> or:
> cache:@http://www.apache.org/?cache:name=main&cache:expires=60
> ?
>
>
Hmm, I would prefer to settle on just one syntax. Prevents confusion and
minimizes amount of code to maintain. Also what to do when expiration
value is not applicable? Ignore it or throw an exception. I think we
should keep it as simple as possible.
Unico
Re: Event caching and CachedSource
Posted by Vadim Gritsenko <va...@reverycodes.com>.
Unico Hommes wrote:
> Carsten Ziegeler wrote:
>
>> Unico Hommes wrote:
>>
>>> I'd also like to change the protocol URL a little bit. Since the
>>> timeout parameter will only be applicable to the delay refresher
>>> implementation and not to the event aware one I think it would be
>>> better to specify it with a query parameter instead.
>>>
>>> Current syntax: cache://60@main@http://www.apache.org/
>>> Proposed syntax:
>>> cache:http://www.apache.org/?cache-expires=60&cache-name=main
>>>
>>> The protocol:subprotocol syntax is also more in line with well
>>> established conventions such as in jdbc for instance.
>>>
>>> Let me know if you have any objections or comments.
>>>
>>
>> No objections from me, but the parameters must have clear names,
>> which means there shouldn't be a conflict. Imagine:
>>
>> cache:http://www.apache.org/?cache-expires=60&cache-name=main&expires=500
>>
>>
>> (Dumb example, I know) But what I mean is that the real url/source
>> could also have parameters and it must be clear which ones are
>> for the cache source and which ones are for the real source,
>> so perhaps something like "cocoon-cache..." or perhaps better
>> using invalid names like "cocoon:cache=60"?
>>
> Yeah I had been thinkin along the same lines. I like the colon
> notation because it resembles familiar namespace notation. So I'll go
> with your latter suggestion.
Does it make sense to have it both ways? So, say, you can use either:
cache:main:60@http://www.apache.org/
or:
cache:@http://www.apache.org/?cache:name=main&cache:expires=60
?
Vadim
Re: Event caching and CachedSource
Posted by Unico Hommes <un...@hippo.nl>.
Carsten Ziegeler wrote:
>Unico Hommes wrote:
>
>
>
>>I'd also like to change the protocol URL a little bit. Since
>>the timeout parameter will only be applicable to the delay
>>refresher implementation and not to the event aware one I
>>think it would be better to specify it with a query parameter instead.
>>
>>Current syntax: cache://60@main@http://www.apache.org/
>>Proposed syntax:
>>cache:http://www.apache.org/?cache-expires=60&cache-name=main
>>
>>The protocol:subprotocol syntax is also more in line with
>>well established conventions such as in jdbc for instance.
>>
>>Let me know if you have any objections or comments.
>>
>>
>>
>No objections from me, but the parameters must have clear names,
>which means there shouldn't be a conflict. Imagine:
>
>cache:http://www.apache.org/?cache-expires=60&cache-name=main&expires=500
>
>(Dumb example, I know) But what I mean is that the real url/source
>could also have parameters and it must be clear which ones are
>for the cache source and which ones are for the real source,
>so perhaps something like "cocoon-cache..." or perhaps better
>using invalid names like "cocoon:cache=60"?
>
>
>
Yeah I had been thinkin along the same lines. I like the colon notation
because it resembles familiar namespace notation. So I'll go with your
latter suggestion.
Unico
RE: Event caching and CachedSource
Posted by Carsten Ziegeler <cz...@s-und-n.de>.
Unico Hommes wrote:
>
> I'd also like to change the protocol URL a little bit. Since
> the timeout parameter will only be applicable to the delay
> refresher implementation and not to the event aware one I
> think it would be better to specify it with a query parameter instead.
>
> Current syntax: cache://60@main@http://www.apache.org/
> Proposed syntax:
> cache:http://www.apache.org/?cache-expires=60&cache-name=main
>
> The protocol:subprotocol syntax is also more in line with
> well established conventions such as in jdbc for instance.
>
> Let me know if you have any objections or comments.
>
No objections from me, but the parameters must have clear names,
which means there shouldn't be a conflict. Imagine:
cache:http://www.apache.org/?cache-expires=60&cache-name=main&expires=500
(Dumb example, I know) But what I mean is that the real url/source
could also have parameters and it must be clear which ones are
for the cache source and which ones are for the real source,
so perhaps something like "cocoon-cache..." or perhaps better
using invalid names like "cocoon:cache=60"?
Carsten
Re: Event caching and CachedSource
Posted by Unico Hommes <un...@hippo.nl>.
Carsten Ziegeler wrote:
>Unico Hommes wrote:
>
>
>>Hi gang :-)
>>
>>A drawback I have been running into lately with eventcache
>>mechanism is that it lacks the ability to remove heavy
>>processing from the critical path. An event will simply
>>remove a set of cached pipelines from the cache completely.
>>Making the subsequent request for such a pipeline potentialy
>>very slow. In applications where isolation is not a
>>requirement this is an unnecessary drawback.
>>
>>I am looking at the excellent CachedSource stuff that is in
>>the scratchpad area ATM and am wondering how it fits together
>>with the eventcache stuff. One thing I am looking into right
>>now is to write an EventAware Refresher implementation.
>>
>>For those unfamiliar with CachedSource, it is a Source
>>wrapper that can cache a its delegate. Refreshing can be done
>>either synchronously or asynchronously but currently only
>>based upon a specified time-out. What I'd like to do is
>>generalize this a bit in order to add the ability to
>>externally trigger invalidation.
>>
>>For this however I think a modification to the Refresher
>>interface is needed.
>>
>>Instead of:
>>
>>Refresher {
>> refresh(key,uri,timeout);
>> periodicallyRefresh(key,uri,timeout);
>>}
>>
>>I'd like to remove timeout semantics from the interface:
>>
>>Refresher {
>> refresh(key,uri,params);
>>}
>>
>>I don't think there is currently a reason for there being two
>>the separate methods. So I think we can safely combine them
>>into one. But I guess I am looking at Carsten for confirmation... :-)
>>
>>
>>
>Although you actually don't need my confirmation as it's not my
>but *our* source, here it is :)
>I think this makes sense and I think we should also move this
>out of the scratchpad afterwards as well.
>
>
I'd also like to change the protocol URL a little bit. Since the timeout
parameter will only be applicable to the delay refresher implementation
and not to the event aware one I think it would be better to specify it
with a query parameter instead.
Current syntax: cache://60@main@http://www.apache.org/
Proposed syntax:
cache:http://www.apache.org/?cache-expires=60&cache-name=main
The protocol:subprotocol syntax is also more in line with well
established conventions such as in jdbc for instance.
Let me know if you have any objections or comments.
Unico
Re: Event caching and CachedSource
Posted by Unico Hommes <un...@hippo.nl>.
Carsten Ziegeler wrote:
>Unico Hommes wrote:
>
>
>>Hi gang :-)
>>
>>A drawback I have been running into lately with eventcache
>>mechanism is that it lacks the ability to remove heavy
>>processing from the critical path. An event will simply
>>remove a set of cached pipelines from the cache completely.
>>Making the subsequent request for such a pipeline potentialy
>>very slow. In applications where isolation is not a
>>requirement this is an unnecessary drawback.
>>
>>I am looking at the excellent CachedSource stuff that is in
>>the scratchpad area ATM and am wondering how it fits together
>>with the eventcache stuff. One thing I am looking into right
>>now is to write an EventAware Refresher implementation.
>>
>>For those unfamiliar with CachedSource, it is a Source
>>wrapper that can cache a its delegate. Refreshing can be done
>>either synchronously or asynchronously but currently only
>>based upon a specified time-out. What I'd like to do is
>>generalize this a bit in order to add the ability to
>>externally trigger invalidation.
>>
>>For this however I think a modification to the Refresher
>>interface is needed.
>>
>>Instead of:
>>
>>Refresher {
>> refresh(key,uri,timeout);
>> periodicallyRefresh(key,uri,timeout);
>>}
>>
>>I'd like to remove timeout semantics from the interface:
>>
>>Refresher {
>> refresh(key,uri,params);
>>}
>>
>>I don't think there is currently a reason for there being two
>>the separate methods. So I think we can safely combine them
>>into one. But I guess I am looking at Carsten for confirmation... :-)
>>
>>
>>
>Although you actually don't need my confirmation as it's not my
>but *our* source, here it is :)
>
>
OK, thanks. Just trying exclude the possibility of overlooking something
and allowing you the oppertunity to comment on any changes beforehand.
>I think this makes sense and I think we should also move this
>out of the scratchpad afterwards as well.
>
>
>
OK, agreed. But where should it go.
Unico
RE: Event caching and CachedSource
Posted by Carsten Ziegeler <cz...@s-und-n.de>.
Unico Hommes wrote:
>
> Hi gang :-)
>
> A drawback I have been running into lately with eventcache
> mechanism is that it lacks the ability to remove heavy
> processing from the critical path. An event will simply
> remove a set of cached pipelines from the cache completely.
> Making the subsequent request for such a pipeline potentialy
> very slow. In applications where isolation is not a
> requirement this is an unnecessary drawback.
>
> I am looking at the excellent CachedSource stuff that is in
> the scratchpad area ATM and am wondering how it fits together
> with the eventcache stuff. One thing I am looking into right
> now is to write an EventAware Refresher implementation.
>
> For those unfamiliar with CachedSource, it is a Source
> wrapper that can cache a its delegate. Refreshing can be done
> either synchronously or asynchronously but currently only
> based upon a specified time-out. What I'd like to do is
> generalize this a bit in order to add the ability to
> externally trigger invalidation.
>
> For this however I think a modification to the Refresher
> interface is needed.
>
> Instead of:
>
> Refresher {
> refresh(key,uri,timeout);
> periodicallyRefresh(key,uri,timeout);
> }
>
> I'd like to remove timeout semantics from the interface:
>
> Refresher {
> refresh(key,uri,params);
> }
>
> I don't think there is currently a reason for there being two
> the separate methods. So I think we can safely combine them
> into one. But I guess I am looking at Carsten for confirmation... :-)
>
Although you actually don't need my confirmation as it's not my
but *our* source, here it is :)
I think this makes sense and I think we should also move this
out of the scratchpad afterwards as well.
Carsten