You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cocoon.apache.org by Unico Hommes <un...@hippo.nl> on 2004/03/02 12:43:30 UTC

Event caching and CachedSource

Hi gang :-)

A drawback I have been running into lately with eventcache mechanism is 
that it lacks the ability to remove heavy processing from the critical 
path. An event will simply remove a set of cached pipelines from the 
cache completely. Making the subsequent request for such a pipeline 
potentialy very slow. In applications where isolation is not a 
requirement this is an unnecessary drawback.

I am looking at the excellent CachedSource stuff that is in the 
scratchpad area ATM and am wondering how it fits together with the 
eventcache stuff. One thing I am looking into right now is to write an 
EventAware Refresher implementation.

For those unfamiliar with CachedSource, it is a Source wrapper that can 
cache a its delegate. Refreshing can be done either synchronously or 
asynchronously but currently only based upon a specified time-out. What 
I'd like to do is generalize this a bit in order to add the ability to  
externally trigger invalidation.

For this however I think a modification to the Refresher interface is 
needed.

Instead of:

Refresher {
  refresh(key,uri,timeout);
  periodicallyRefresh(key,uri,timeout);
}

I'd like to remove timeout semantics from the interface:

Refresher {
  refresh(key,uri,params);
}

I don't think there is currently a reason for there being two the 
separate methods. So I think we can safely combine them into one. But I 
guess I am looking at Carsten for confirmation... :-)

Cheers,
Unico

Re: Event caching and CachedSource

Posted by Unico Hommes <un...@hippo.nl>.

Geoff Howard wrote:

> Unico Hommes wrote:
>
>> Geoff Howard wrote:
>>
>>> Unico Hommes wrote:
>>>
>>>> Hi gang :-)
>>>>
>>>> A drawback I have been running into lately with eventcache 
>>>> mechanism is that it lacks the ability to remove heavy processing 
>>>> from the critical path. An event will simply remove a set of cached 
>>>> pipelines from the cache completely. Making the subsequent request 
>>>> for such a pipeline potentialy very slow. In applications where 
>>>> isolation is not a requirement this is an unnecessary drawback.
>>>
>>>
>>>
>>> Below sounds interesting and good but I haven't understood how event 
>>> cache is related.  AFAICS the only difference with eventcache and 
>>> the other validity types is that for the others an invalid response 
>>> is found in cache, but not used because it is found invalid after 
>>> retrieval, but the event cache removes the entry at invalidation 
>>> time since it knows it will never be useful.  Both cases mean that 
>>> the next person to request that resource will have to wait for the 
>>> full generation.  Maybe because I've only glanced at the refresher 
>>> stuff?
>>>
>> I guess you are right that at the Cache level nothing really changes. 
>> I overlooked that fact. I will do some more research on what is 
>> required to accomplish that in the case of the Refresher, but my idea 
>> was that the cached response would be served until a newly generated 
>> one could replace the stale one. Since the Refresher talks to the 
>> Cache directly, given the correct Validity strategy it can exercise 
>> full control over it.
>
>
>
> So, stale entries are served until they can be regenerated?  I've 
> looked for this in the past (someone called it the "I'm Sorry" pattern 
> :) ) and at the time thought it might be better implemented by a 
> pluggable strategy at the pipeline execution level.  Currently we have:
>
> - Assemble Pipeline
> - Gather key from Pipeline
> - Check cache for key
> - If object for key found, check its validity
> - If valid, serve the cached response
> - Else, execute pipeline and serve it.
>
> the cache point pipeline, and the non-caching pipeline are other 
> implementations of different strategies, but are accomplished by 
> inheritance instead of composing a Strategy.  I haven't ever thought 
> it through carefully but it seems like making those last 5 steps (as a 
> group) a pluggable strategy would allow things like this "I'm Sorry" 
> pattern, as well as more powerful concepts like Stefano's proposed 
> adaptive cache.  Just raw thoughts at this point...


I see two things at stake in my use case. The strategy pattern as you 
call it (regular,inverted,'i'm sorry', adaptive,etc.) and the 
granularity of  objects in the cache. In my case it is very inefficient 
to only cache complete pipelines and I need to have multiple levels of 
caching to optimize performance: besides caching the complete pipeline, 
also the individual sources that compise a traversable generation.

I am not sure I understand what you mean with 'pluggable strategy'. 
Isn't this what we already have with the different pipeline implementations?

Unico

Re: Event caching and CachedSource

Posted by Geoff Howard <co...@leverageweb.com>.

Unico Hommes wrote:

> Geoff Howard wrote:
>
>> Unico Hommes wrote:
>>
>>> Hi gang :-)
>>>
>>> A drawback I have been running into lately with eventcache mechanism 
>>> is that it lacks the ability to remove heavy processing from the 
>>> critical path. An event will simply remove a set of cached pipelines 
>>> from the cache completely. Making the subsequent request for such a 
>>> pipeline potentialy very slow. In applications where isolation is 
>>> not a requirement this is an unnecessary drawback.
>>
>>
>> Below sounds interesting and good but I haven't understood how event 
>> cache is related.  AFAICS the only difference with eventcache and the 
>> other validity types is that for the others an invalid response is 
>> found in cache, but not used because it is found invalid after 
>> retrieval, but the event cache removes the entry at invalidation time 
>> since it knows it will never be useful.  Both cases mean that the 
>> next person to request that resource will have to wait for the full 
>> generation.  Maybe because I've only glanced at the refresher stuff?
>>
> I guess you are right that at the Cache level nothing really changes. 
> I overlooked that fact. I will do some more research on what is 
> required to accomplish that in the case of the Refresher, but my idea 
> was that the cached response would be served until a newly generated 
> one could replace the stale one. Since the Refresher talks to the 
> Cache directly, given the correct Validity strategy it can exercise 
> full control over it.


So, stale entries are served until they can be regenerated?  I've looked 
for this in the past (someone called it the "I'm Sorry" pattern :) ) and 
at the time thought it might be better implemented by a pluggable 
strategy at the pipeline execution level.  Currently we have:

- Assemble Pipeline
- Gather key from Pipeline
- Check cache for key
- If object for key found, check its validity
- If valid, serve the cached response
- Else, execute pipeline and serve it.

the cache point pipeline, and the non-caching pipeline are other 
implementations of different strategies, but are accomplished by 
inheritance instead of composing a Strategy.  I haven't ever thought it 
through carefully but it seems like making those last 5 steps (as a 
group) a pluggable strategy would allow things like this "I'm Sorry" 
pattern, as well as more powerful concepts like Stefano's proposed 
adaptive cache.  Just raw thoughts at this point...

>> Bottom line for me at moment is: do you foresee a need to modify the 
>> eventcache API to accomodate this need?  I'm getting ready to start a 
>> discussion on changing the eventcache unstable status -- should I 
>> hold off?
>>
> I don't think my current work will influence the eventcache API 
> directly. Although I am not sure if
> the eventcache stuff can be considered stable enough. I still have 
> some doubts about the ease of use of parts of it especially the way 
> events are associated with cached objects. But lets discuss that 
> separately.


Ah, good.  Ok, I'll pick up on another thread.

Geoff

RE: Event caching and CachedSource

Posted by Carsten Ziegeler <cz...@s-und-n.de>.

Unico Hommes wrote:

> > BTW, how does CachedSource accomplish something different from the 
> > caching point pipeline (which seems to accomplish more, though I've 
> > never used it).
> >
> I never used it either. So I really don't know. Perhaps 
> someone else could comment on this?
> 
The CachedSource caches a source :) whereas the caching point pipeline
caches part of a pipeline. They could be used in combination but have
different purposes.
The caching point pipeline can cache the beginning of a pipeline upto
the point, but this only works if all components in the pipeline
support the caching; if not, nothing is cached.

Now, imagine that you have a database source that fetches content
from a slow database (or cms). The usual caching alg. tries to
look if the source read by the generator has changed since the last call.
In the case of the database source this is not possible and the
pipeline is never cached.
With the cached source the content fetched from the db is cached,
reducing the requests to the back-end system and the generator
can use this to test if the source has changed, allowing the
pipeline (or a part of it) to be cached as well.

HTH
Carsten

Re: Event caching and CachedSource

Posted by Unico Hommes <un...@hippo.nl>.

Geoff Howard wrote:

> Unico Hommes wrote:
>
>> Hi gang :-)
>>
>> A drawback I have been running into lately with eventcache mechanism 
>> is that it lacks the ability to remove heavy processing from the 
>> critical path. An event will simply remove a set of cached pipelines 
>> from the cache completely. Making the subsequent request for such a 
>> pipeline potentialy very slow. In applications where isolation is not 
>> a requirement this is an unnecessary drawback.
>
>
>
> Below sounds interesting and good but I haven't understood how event 
> cache is related.  AFAICS the only difference with eventcache and the 
> other validity types is that for the others an invalid response is 
> found in cache, but not used because it is found invalid after 
> retrieval, but the event cache removes the entry at invalidation time 
> since it knows it will never be useful.  Both cases mean that the next 
> person to request that resource will have to wait for the full 
> generation.  Maybe because I've only glanced at the refresher stuff?
>
I guess you are right that at the Cache level nothing really changes. I 
overlooked that fact. I will do some more research on what is required 
to accomplish that in the case of the Refresher, but my idea was that 
the cached response would be served until a newly generated one could 
replace the stale one. Since the Refresher talks to the Cache directly, 
given the correct Validity strategy it can exercise full control over it.

> Bottom line for me at moment is: do you foresee a need to modify the 
> eventcache API to accomodate this need?  I'm getting ready to start a 
> discussion on changing the eventcache unstable status -- should I hold 
> off?
>
I don't think my current work will influence the eventcache API 
directly. Although I am not sure if
the eventcache stuff can be considered stable enough. I still have some 
doubts about the ease of use of parts of it especially the way events 
are associated with cached objects. But lets discuss that separately.

>> I am looking at the excellent CachedSource stuff that is in the 
>> scratchpad area ATM and am wondering how it fits together with the 
>> eventcache stuff. One thing I am looking into right now is to write 
>> an EventAware Refresher implementation.
>>
>> For those unfamiliar with CachedSource, it is a Source wrapper that 
>> can cache a its delegate. Refreshing can be done either synchronously 
>> or asynchronously but currently only based upon a specified time-out. 
>> What I'd like to do is generalize this a bit in order to add the 
>> ability to  externally trigger invalidation.
>>
>> For this however I think a modification to the Refresher interface is 
>> needed.
>
>
>
> BTW, how does CachedSource accomplish something different from the 
> caching point pipeline (which seems to accomplish more, though I've 
> never used it).
>
I never used it either. So I really don't know. Perhaps someone else 
could comment on this?

Cheers,
Unico

Re: Event caching and CachedSource

Posted by Geoff Howard <co...@leverageweb.com>.

Unico Hommes wrote:

> Hi gang :-)
>
> A drawback I have been running into lately with eventcache mechanism 
> is that it lacks the ability to remove heavy processing from the 
> critical path. An event will simply remove a set of cached pipelines 
> from the cache completely. Making the subsequent request for such a 
> pipeline potentialy very slow. In applications where isolation is not 
> a requirement this is an unnecessary drawback.

Below sounds interesting and good but I haven't understood how event 
cache is related.  AFAICS the only difference with eventcache and the 
other validity types is that for the others an invalid response is found 
in cache, but not used because it is found invalid after retrieval, but 
the event cache removes the entry at invalidation time since it knows it 
will never be useful.  Both cases mean that the next person to request 
that resource will have to wait for the full generation.  Maybe because 
I've only glanced at the refresher stuff?

Bottom line for me at moment is: do you foresee a need to modify the 
eventcache API to accomodate this need?  I'm getting ready to start a 
discussion on changing the eventcache unstable status -- should I hold off?

> I am looking at the excellent CachedSource stuff that is in the 
> scratchpad area ATM and am wondering how it fits together with the 
> eventcache stuff. One thing I am looking into right now is to write an 
> EventAware Refresher implementation.
>
> For those unfamiliar with CachedSource, it is a Source wrapper that 
> can cache a its delegate. Refreshing can be done either synchronously 
> or asynchronously but currently only based upon a specified time-out. 
> What I'd like to do is generalize this a bit in order to add the 
> ability to  externally trigger invalidation.
>
> For this however I think a modification to the Refresher interface is 
> needed.

BTW, how does CachedSource accomplish something different from the 
caching point pipeline (which seems to accomplish more, though I've 
never used it).

Geoff

Re: Event caching and CachedSource

Posted by Unico Hommes <un...@hippo.nl>.

Vadim Gritsenko wrote:

> Unico Hommes wrote:
>
>> Carsten Ziegeler wrote:
>>
>>> Unico Hommes wrote:
>>>
>>>> I'd also like to change the protocol URL a little bit. Since the 
>>>> timeout parameter will only be applicable to the delay refresher 
>>>> implementation and not to the event aware one I think it would be 
>>>> better to specify it with a query parameter instead.
>>>>
>>>> Current syntax: cache://60@main@http://www.apache.org/
>>>> Proposed syntax: 
>>>> cache:http://www.apache.org/?cache-expires=60&cache-name=main
>>>>
>>>> The protocol:subprotocol syntax is also more in line with well 
>>>> established conventions such as in jdbc for instance.
>>>>
>>>> Let me know if you have any objections or comments.
>>>>   
>>>
>>>
>>> No objections from me, but the parameters must have clear names, 
>>> which means there shouldn't be a conflict. Imagine:
>>>
>>> cache:http://www.apache.org/?cache-expires=60&cache-name=main&expires=500 
>>>
>>>
>>> (Dumb example, I know) But what I mean is that the real url/source
>>> could also have parameters and it must be clear which ones are
>>> for the cache source and which ones are for the real source,
>>> so perhaps something like "cocoon-cache..." or perhaps better
>>> using invalid names like "cocoon:cache=60"?
>>>
>> Yeah I had been thinkin along the same lines. I like the colon 
>> notation because it resembles familiar namespace notation. So I'll go 
>> with your latter suggestion.
>
>
>
> Does it make sense to have it both ways? So, say, you can use either:
>    cache:main:60@http://www.apache.org/
> or:
>    cache:@http://www.apache.org/?cache:name=main&cache:expires=60
> ?
>
>

Hmm, I would prefer to settle on just one syntax. Prevents confusion and 
minimizes amount of code to maintain. Also what to do when expiration 
value is not applicable? Ignore it or throw an exception. I think we 
should keep it as simple as possible.

Unico

Re: Event caching and CachedSource

Posted by Vadim Gritsenko <va...@reverycodes.com>.

Unico Hommes wrote:

> Carsten Ziegeler wrote:
>
>> Unico Hommes wrote:
>>
>>> I'd also like to change the protocol URL a little bit. Since the 
>>> timeout parameter will only be applicable to the delay refresher 
>>> implementation and not to the event aware one I think it would be 
>>> better to specify it with a query parameter instead.
>>>
>>> Current syntax: cache://60@main@http://www.apache.org/
>>> Proposed syntax: 
>>> cache:http://www.apache.org/?cache-expires=60&cache-name=main
>>>
>>> The protocol:subprotocol syntax is also more in line with well 
>>> established conventions such as in jdbc for instance.
>>>
>>> Let me know if you have any objections or comments.
>>>   
>>
>> No objections from me, but the parameters must have clear names, 
>> which means there shouldn't be a conflict. Imagine:
>>
>> cache:http://www.apache.org/?cache-expires=60&cache-name=main&expires=500 
>>
>>
>> (Dumb example, I know) But what I mean is that the real url/source
>> could also have parameters and it must be clear which ones are
>> for the cache source and which ones are for the real source,
>> so perhaps something like "cocoon-cache..." or perhaps better
>> using invalid names like "cocoon:cache=60"?
>>
> Yeah I had been thinkin along the same lines. I like the colon 
> notation because it resembles familiar namespace notation. So I'll go 
> with your latter suggestion.


Does it make sense to have it both ways? So, say, you can use either:
    cache:main:60@http://www.apache.org/
or:
    cache:@http://www.apache.org/?cache:name=main&cache:expires=60
?


Vadim

Re: Event caching and CachedSource

Posted by Unico Hommes <un...@hippo.nl>.

Carsten Ziegeler wrote:

>Unico Hommes wrote:
>
>  
>
>>I'd also like to change the protocol URL a little bit. Since 
>>the timeout parameter will only be applicable to the delay 
>>refresher implementation and not to the event aware one I 
>>think it would be better to specify it with a query parameter instead.
>>
>>Current syntax: cache://60@main@http://www.apache.org/
>>Proposed syntax: 
>>cache:http://www.apache.org/?cache-expires=60&cache-name=main
>>
>>The protocol:subprotocol syntax is also more in line with 
>>well established conventions such as in jdbc for instance.
>>
>>Let me know if you have any objections or comments.
>>
>>    
>>
>No objections from me, but the parameters must have clear names, 
>which means there shouldn't be a conflict. Imagine:
>
>cache:http://www.apache.org/?cache-expires=60&cache-name=main&expires=500
>
>(Dumb example, I know) But what I mean is that the real url/source
>could also have parameters and it must be clear which ones are
>for the cache source and which ones are for the real source,
>so perhaps something like "cocoon-cache..." or perhaps better
>using invalid names like "cocoon:cache=60"?
>
>  
>
Yeah I had been thinkin along the same lines. I like the colon notation 
because it resembles familiar namespace notation. So I'll go with your 
latter suggestion.

Unico

RE: Event caching and CachedSource

Posted by Carsten Ziegeler <cz...@s-und-n.de>.

Unico Hommes wrote:

> 
> I'd also like to change the protocol URL a little bit. Since 
> the timeout parameter will only be applicable to the delay 
> refresher implementation and not to the event aware one I 
> think it would be better to specify it with a query parameter instead.
> 
> Current syntax: cache://60@main@http://www.apache.org/
> Proposed syntax: 
> cache:http://www.apache.org/?cache-expires=60&cache-name=main
> 
> The protocol:subprotocol syntax is also more in line with 
> well established conventions such as in jdbc for instance.
> 
> Let me know if you have any objections or comments.
> 
No objections from me, but the parameters must have clear names, 
which means there shouldn't be a conflict. Imagine:

cache:http://www.apache.org/?cache-expires=60&cache-name=main&expires=500

(Dumb example, I know) But what I mean is that the real url/source
could also have parameters and it must be clear which ones are
for the cache source and which ones are for the real source,
so perhaps something like "cocoon-cache..." or perhaps better
using invalid names like "cocoon:cache=60"?

Carsten

Re: Event caching and CachedSource

Posted by Unico Hommes <un...@hippo.nl>.

Carsten Ziegeler wrote:

>Unico Hommes wrote:
>  
>
>>Hi gang :-)
>>
>>A drawback I have been running into lately with eventcache 
>>mechanism is that it lacks the ability to remove heavy 
>>processing from the critical path. An event will simply 
>>remove a set of cached pipelines from the cache completely. 
>>Making the subsequent request for such a pipeline potentialy 
>>very slow. In applications where isolation is not a 
>>requirement this is an unnecessary drawback.
>>
>>I am looking at the excellent CachedSource stuff that is in 
>>the scratchpad area ATM and am wondering how it fits together 
>>with the eventcache stuff. One thing I am looking into right 
>>now is to write an EventAware Refresher implementation.
>>
>>For those unfamiliar with CachedSource, it is a Source 
>>wrapper that can cache a its delegate. Refreshing can be done 
>>either synchronously or asynchronously but currently only 
>>based upon a specified time-out. What I'd like to do is 
>>generalize this a bit in order to add the ability to 
>>externally trigger invalidation.
>>
>>For this however I think a modification to the Refresher 
>>interface is needed.
>>
>>Instead of:
>>
>>Refresher {
>>  refresh(key,uri,timeout);
>>  periodicallyRefresh(key,uri,timeout);
>>}
>>
>>I'd like to remove timeout semantics from the interface:
>>
>>Refresher {
>>  refresh(key,uri,params);
>>}
>>
>>I don't think there is currently a reason for there being two 
>>the separate methods. So I think we can safely combine them 
>>into one. But I guess I am looking at Carsten for confirmation... :-)
>>
>>    
>>
>Although you actually don't need my confirmation as it's not my
>but *our* source, here it is :)
>I think this makes sense and I think we should also move this
>out of the scratchpad afterwards as well.
>  
>

I'd also like to change the protocol URL a little bit. Since the timeout 
parameter will only be applicable to the delay refresher implementation 
and not to the event aware one I think it would be better to specify it 
with a query parameter instead.

Current syntax: cache://60@main@http://www.apache.org/
Proposed syntax: 
cache:http://www.apache.org/?cache-expires=60&cache-name=main

The protocol:subprotocol syntax is also more in line with well 
established conventions such as in jdbc for instance.

Let me know if you have any objections or comments.

Unico

Re: Event caching and CachedSource

Posted by Unico Hommes <un...@hippo.nl>.

Carsten Ziegeler wrote:

>Unico Hommes wrote:
>  
>
>>Hi gang :-)
>>
>>A drawback I have been running into lately with eventcache 
>>mechanism is that it lacks the ability to remove heavy 
>>processing from the critical path. An event will simply 
>>remove a set of cached pipelines from the cache completely. 
>>Making the subsequent request for such a pipeline potentialy 
>>very slow. In applications where isolation is not a 
>>requirement this is an unnecessary drawback.
>>
>>I am looking at the excellent CachedSource stuff that is in 
>>the scratchpad area ATM and am wondering how it fits together 
>>with the eventcache stuff. One thing I am looking into right 
>>now is to write an EventAware Refresher implementation.
>>
>>For those unfamiliar with CachedSource, it is a Source 
>>wrapper that can cache a its delegate. Refreshing can be done 
>>either synchronously or asynchronously but currently only 
>>based upon a specified time-out. What I'd like to do is 
>>generalize this a bit in order to add the ability to 
>>externally trigger invalidation.
>>
>>For this however I think a modification to the Refresher 
>>interface is needed.
>>
>>Instead of:
>>
>>Refresher {
>>  refresh(key,uri,timeout);
>>  periodicallyRefresh(key,uri,timeout);
>>}
>>
>>I'd like to remove timeout semantics from the interface:
>>
>>Refresher {
>>  refresh(key,uri,params);
>>}
>>
>>I don't think there is currently a reason for there being two 
>>the separate methods. So I think we can safely combine them 
>>into one. But I guess I am looking at Carsten for confirmation... :-)
>>
>>    
>>
>Although you actually don't need my confirmation as it's not my
>but *our* source, here it is :)
>  
>

OK, thanks. Just trying exclude the possibility of overlooking something 
and allowing you the oppertunity to comment on any changes beforehand.

>I think this makes sense and I think we should also move this
>out of the scratchpad afterwards as well.
>
>  
>

OK, agreed. But where should it go.

Unico

RE: Event caching and CachedSource

Posted by Carsten Ziegeler <cz...@s-und-n.de>.

Unico Hommes wrote:
> 
> Hi gang :-)
> 
> A drawback I have been running into lately with eventcache 
> mechanism is that it lacks the ability to remove heavy 
> processing from the critical path. An event will simply 
> remove a set of cached pipelines from the cache completely. 
> Making the subsequent request for such a pipeline potentialy 
> very slow. In applications where isolation is not a 
> requirement this is an unnecessary drawback.
> 
> I am looking at the excellent CachedSource stuff that is in 
> the scratchpad area ATM and am wondering how it fits together 
> with the eventcache stuff. One thing I am looking into right 
> now is to write an EventAware Refresher implementation.
> 
> For those unfamiliar with CachedSource, it is a Source 
> wrapper that can cache a its delegate. Refreshing can be done 
> either synchronously or asynchronously but currently only 
> based upon a specified time-out. What I'd like to do is 
> generalize this a bit in order to add the ability to 
> externally trigger invalidation.
> 
> For this however I think a modification to the Refresher 
> interface is needed.
> 
> Instead of:
> 
> Refresher {
>   refresh(key,uri,timeout);
>   periodicallyRefresh(key,uri,timeout);
> }
> 
> I'd like to remove timeout semantics from the interface:
> 
> Refresher {
>   refresh(key,uri,params);
> }
> 
> I don't think there is currently a reason for there being two 
> the separate methods. So I think we can safely combine them 
> into one. But I guess I am looking at Carsten for confirmation... :-)
> 
Although you actually don't need my confirmation as it's not my
but *our* source, here it is :)
I think this makes sense and I think we should also move this
out of the scratchpad afterwards as well.

Carsten