You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cocoon.apache.org by Carsten Ziegeler <cz...@apache.org> on 2008/12/22 17:29:49 UTC

[C3] Caching

Hi,

I just started with looking closer at the pipeline stuff we have for c3
atm. My first impression was that there are too many interfaces which
might confuse users :) As we step away from just a sax based pipeline, I
fear we really might need all these interfaces :(

The other thing is caching: i guess in many scenarios caching is not
done on the pipeline or even pipeline component level. So I think we
should try to make caching more optional - I know that it is optional
from a feature point but I would like to have the pipeline jar as small
as possible and move the caching stuff into an optional/additional lib.
I have no good idea atm how to do this as the pipeline components itself
need to be aware of caching. But perhaps someone else has?

Regards
Carsten
--
Carsten Ziegeler
cziegeler@apache.org

Re: [C3] Caching

Posted by Carsten Ziegeler <cz...@apache.org>.

Peter Hunsberger wrote:
> On Mon, Dec 22, 2008 at 10:29 AM, Carsten Ziegeler <cz...@apache.org> wrote:
>> Hi,
>>
>> I just started with looking closer at the pipeline stuff we have for c3
>> atm. My first impression was that there are too many interfaces which
>> might confuse users :) As we step away from just a sax based pipeline, I
>> fear we really might need all these interfaces :(
>>
>> The other thing is caching: i guess in many scenarios caching is not
>> done on the pipeline or even pipeline component level. So I think we
>> should try to make caching more optional - I know that it is optional
>> from a feature point but I would like to have the pipeline jar as small
>> as possible and move the caching stuff into an optional/additional lib.
>> I have no good idea atm how to do this as the pipeline components itself
>> need to be aware of caching. But perhaps someone else has?
> 
> Sounds like a perfect opportunity for some Aspect Oriented coding........ ;-)
> 

:) Yepp, I had the same feeling - but I'm not sure if this makes the
thing easier.
Especially as I would like to use the stuff in an OSGi environment where
it is a little bit harder to use aop (i know, it's possible and people
are working on improving it).

Carsten

-- 
Carsten Ziegeler
cziegeler@apache.org

Re: [C3] Caching

Posted by Peter Hunsberger <pe...@gmail.com>.

On Mon, Dec 22, 2008 at 10:29 AM, Carsten Ziegeler <cz...@apache.org> wrote:
> Hi,
>
> I just started with looking closer at the pipeline stuff we have for c3
> atm. My first impression was that there are too many interfaces which
> might confuse users :) As we step away from just a sax based pipeline, I
> fear we really might need all these interfaces :(
>
> The other thing is caching: i guess in many scenarios caching is not
> done on the pipeline or even pipeline component level. So I think we
> should try to make caching more optional - I know that it is optional
> from a feature point but I would like to have the pipeline jar as small
> as possible and move the caching stuff into an optional/additional lib.
> I have no good idea atm how to do this as the pipeline components itself
> need to be aware of caching. But perhaps someone else has?

Sounds like a perfect opportunity for some Aspect Oriented coding........ ;-)


-- 
Peter Hunsberger

Re: [C3] Caching

Posted by Carsten Ziegeler <cz...@apache.org>.

Hi,

Steven Dolg wrote:
> Actually that is all we have right now.
> Pipeline and PipelineComponent. With PipelineComponent being
> differentiated into Starter, Finisher and Consumer/Producer - what you
> called the "middle parts". (Actually there is still a design flaw: the
> consumer is no PipelineComponent, but that'll be corrected presently).
> Consumer and Producer are separated into two interfaces, as to not
> require special handling when linking the Starter with the second
> component or the second to last with the Finisher.
> 
> I like to call these the "structural layer", because they define the
> structure of any pipeline:
>    * first component is always a "Starter"
>    * last component is always a "Finisher"
>    * any number pairs of "Producer"/"Consumer" between them
> 
> Every (valid) pipeline is composed in this way - regardless of what
> content or which implementation.
> 
> Thus these interfaces have no relation to anything that is content-based.
> But that's also why we need additional layers, because otherwise the
> pipeline would be specific for every type of content (which it should not)
> 
I like the explaination :) Yes, it makes sense this way.

> All the XML... interfaces are already the "content layer" (in this case
> for SAX) on top of the "strcutural layer"
> It defines how the components communicate with each other:
>    1. XMLConsumer is a ContentHandler and LexicalHandler
>    2. XMLProducer can accept an XMLConsumer and provide it with
> appropriate data
> 
> Follow (basically) by AbstractXMLProducer (implementation of 2. from
> above) and AbstractTransformer (AbstractXMLProducer + XMLConsumer).
> The "content layer" is already beyond the scope of the Pipeline API
> module - it does not care about this at all.
> This is solely the domain of the pipeline components.
> That's why the SAX layer should be removed asap from the Pipeline API.
Yepp.

> <SNIP/>
> Absolutely!
> With two types of XML components (SAX, StAX) this name is completely
> ambiguous and must be changed (I prefer SaxConsumer).
> 
> This would also emphasize that the Pipeline API is designed to be
> content-agnostic and SAX is merely one type of content it can handle.
> StAX is the first step to add another content-type - well it's still
> XML, but works completely different.
Yes.

> Not at all - I usually welcome all input!
> I just have a hard time, when I struggle to get the actual point.
:)
> 
> Trying to understand what you mean, I looked at some code and followed
> the class/interface hierarchy.
> And while doing so I found that some of this is rather strange and took
> me moment to sort out.
> And I think I know now what you mean.
Yes, it's hard to explain (at least for me) :)

> 
> But I guess it's mostly a problem with the names of the
> classes/interfaces (like "AbstractTransformer" being actually an SAX
> based Producer/Consumer) and mixing SAX with the Pipeline.
> I suppose separating those two and changing some of the names it should
> be alot better.
Yes, I guess that's all we have to do.

>>  It's
>> just about minor improvements if at all.
> I think it's actually a little bit more than that.
> Seems like I was already so accustomed to the concept in my head that I
> didn't even see, the implementation has become less clear that it could be.
This happens, but here we have the advantage of the community :)

> So PLEASE keep on doing this!
> And don't feel bad when my answers appear a little harsh - I rarely mean
> it that way... ;-)
No problem with that :)

Carsten

-- 
Carsten Ziegeler
cziegeler@apache.org

Re: [C3] Caching

Posted by Steven Dolg <st...@indoqa.com>.

Carsten Ziegeler schrieb:
> Steven Dolg wrote:
>   
>> Regarding the whole topic
>> In your first mail you wrote: "there are too many interfaces which might
>> confuse users"
>> Mind explaining what you mean?
>>     
> Sure, see below :)
>   
>> IMO a user won't have to deal with many of the interfaces at all - even
>> if (s)he uses the pipeline API directly (iow programmatically), not to
>> mention when using the sitemap.
>> Someone who actually deals with the Cocoon code shouldn't have too much
>> trouble - especially when comparing it to Cocoon 2.x.
>>
>> IIRC this isn't the first time that someone said "Cocoon 3 is already
>> too complicated because of too many/much ..." (e.g. interfaces, modules,
>> complexity etc).
>>     
> I think, I didn't say that (at least not directly) :)
>
>   
>> I mean Cocoon 2.x is at least 5 times as much as Cocoon 3 - no matter
>> what kind of metric you use (I'm just guessing here, didn't actually
>> measure).
>> So how come a considerably smaller/simpler approach is suddenly too
>> complicated or too confusing?
>>     
> Again, I was just trying to make the point that there are already a lot
> of interfaces, abstract classes and classes. And I also said that given
> the functionality we want, there might be no other way.
> Comparing Cocoon 2 with the pipeline api is comparing apples with
> oranges. I'm just talking about the pipeline stuff we have and without
> further thinking one would expect four interfaces (a pipeline, a start,
> an end and middle parts)
>   
Actually that is all we have right now.
Pipeline and PipelineComponent. With PipelineComponent being 
differentiated into Starter, Finisher and Consumer/Producer - what you 
called the "middle parts". (Actually there is still a design flaw: the 
consumer is no PipelineComponent, but that'll be corrected presently).
Consumer and Producer are separated into two interfaces, as to not 
require special handling when linking the Starter with the second 
component or the second to last with the Finisher.

I like to call these the "structural layer", because they define the 
structure of any pipeline:
    * first component is always a "Starter"
    * last component is always a "Finisher"
    * any number pairs of "Producer"/"Consumer" between them

Every (valid) pipeline is composed in this way - regardless of what 
content or which implementation.

Thus these interfaces have no relation to anything that is content-based.
But that's also why we need additional layers, because otherwise the 
pipeline would be specific for every type of content (which it should not)

> But we have Pipeline, Consumer, Finisher, Producer, Starter,
> XMLConsumer, XMLProducer, followed by a bunch of abstract classes - and
>   
All the XML... interfaces are already the "content layer" (in this case 
for SAX) on top of the "strcutural layer"
It defines how the components communicate with each other:
    1. XMLConsumer is a ContentHandler and LexicalHandler
    2. XMLProducer can accept an XMLConsumer and provide it with 
appropriate data

Follow (basically) by AbstractXMLProducer (implementation of 2. from 
above) and AbstractTransformer (AbstractXMLProducer + XMLConsumer).
The "content layer" is already beyond the scope of the Pipeline API 
module - it does not care about this at all.
This is solely the domain of the pipeline components.
That's why the SAX layer should be removed asap from the Pipeline API.

> even with a Cocoon 2 background, you might be a little bit lost as you
> don't see a generator, transformer or serializer as an interface.
> Now, all the stuff here is for good reason and makes sense, so I guess
> we need all of this - and in the end it also makes sense to not use the
> Cocoon 2 names. But it would be nice if we could keep this to a minimum
> and perhaps take a look if we could reduce something somewhere or
> perhaps rename something to make it even easier to use - now this is all
> a little bit vague, I know, if I would have concrete ideas I would tell
> them. It's just the feeling that what we have atm at least "looks" like
> too much.
> As Cocoon 3 was not available when I needed a simple pipeline
> implementations and as I needed at that point something very quickly, I
> wrote my own pipeline api and implementation which is just sax based  -
> and now I'm trying to map the functionality I have, to what we have in
> C3 and it took me a little bit to find out what interfaces exactly I now
> have to implement just by looking at all the names. But maybe it's just me.
>
> Perhaps moving the sax stuff to another module, having the xml-util
> module is already enough. Maybe renaming something like XMLConsumer to
> SAXConsumer helps as well.
>   
Absolutely!
With two types of XML components (SAX, StAX) this name is completely 
ambiguous and must be changed (I prefer SaxConsumer).

This would also emphasize that the Pipeline API is designed to be 
content-agnostic and SAX is merely one type of content it can handle.
StAX is the first step to add another content-type - well it's still 
XML, but works completely different.
> So please, don't consider this as a critics on the whole concept.
Not at all - I usually welcome all input!
I just have a hard time, when I struggle to get the actual point.


Trying to understand what you mean, I looked at some code and followed 
the class/interface hierarchy.
And while doing so I found that some of this is rather strange and took 
me moment to sort out.
And I think I know now what you mean.

But I guess it's mostly a problem with the names of the 
classes/interfaces (like "AbstractTransformer" being actually an SAX 
based Producer/Consumer) and mixing SAX with the Pipeline.
I suppose separating those two and changing some of the names it should 
be alot better.




>  It's
> just about minor improvements if at all.
>   
I think it's actually a little bit more than that.
Seems like I was already so accustomed to the concept in my head that I 
didn't even see, the implementation has become less clear that it could be.

So PLEASE keep on doing this!
And don't feel bad when my answers appear a little harsh - I rarely mean 
it that way... ;-)

Steven
> Carsten
>

Re: [C3] Caching

Posted by Carsten Ziegeler <cz...@apache.org>.

Steven Dolg wrote:
> 
> Regarding the whole topic
> In your first mail you wrote: "there are too many interfaces which might
> confuse users"
> Mind explaining what you mean?
Sure, see below :)
> 
> IMO a user won't have to deal with many of the interfaces at all - even
> if (s)he uses the pipeline API directly (iow programmatically), not to
> mention when using the sitemap.
> Someone who actually deals with the Cocoon code shouldn't have too much
> trouble - especially when comparing it to Cocoon 2.x.
> 
> IIRC this isn't the first time that someone said "Cocoon 3 is already
> too complicated because of too many/much ..." (e.g. interfaces, modules,
> complexity etc).
I think, I didn't say that (at least not directly) :)

> I mean Cocoon 2.x is at least 5 times as much as Cocoon 3 - no matter
> what kind of metric you use (I'm just guessing here, didn't actually
> measure).
> So how come a considerably smaller/simpler approach is suddenly too
> complicated or too confusing?
Again, I was just trying to make the point that there are already a lot
of interfaces, abstract classes and classes. And I also said that given
the functionality we want, there might be no other way.
Comparing Cocoon 2 with the pipeline api is comparing apples with
oranges. I'm just talking about the pipeline stuff we have and without
further thinking one would expect four interfaces (a pipeline, a start,
an end and middle parts)

But we have Pipeline, Consumer, Finisher, Producer, Starter,
XMLConsumer, XMLProducer, followed by a bunch of abstract classes - and
even with a Cocoon 2 background, you might be a little bit lost as you
don't see a generator, transformer or serializer as an interface.
Now, all the stuff here is for good reason and makes sense, so I guess
we need all of this - and in the end it also makes sense to not use the
Cocoon 2 names. But it would be nice if we could keep this to a minimum
and perhaps take a look if we could reduce something somewhere or
perhaps rename something to make it even easier to use - now this is all
a little bit vague, I know, if I would have concrete ideas I would tell
them. It's just the feeling that what we have atm at least "looks" like
too much.
As Cocoon 3 was not available when I needed a simple pipeline
implementations and as I needed at that point something very quickly, I
wrote my own pipeline api and implementation which is just sax based  -
and now I'm trying to map the functionality I have, to what we have in
C3 and it took me a little bit to find out what interfaces exactly I now
have to implement just by looking at all the names. But maybe it's just me.

Perhaps moving the sax stuff to another module, having the xml-util
module is already enough. Maybe renaming something like XMLConsumer to
SAXConsumer helps as well.

So please, don't consider this as a critics on the whole concept. It's
just about minor improvements if at all.

Carsten
-- 
Carsten Ziegeler
cziegeler@apache.org

Re: [C3] Caching

Posted by Steven Dolg <st...@indoqa.com>.

Carsten Ziegeler schrieb:
> Reinhard Pötz wrote:
>   
>> I would only make the caching file generator available to the sitemap.
>> If you put it into a caching pipeline, its caching interfaces will be
>> regarded, if it is used in a noncaching pipeline, the
>> CachingPipelineComponent interface will be disregarded.
>>     
> Hmm, yes, sounds much simpler :)
>
>   
>> I'm not sure if it is a good idea to introduce aspect orientation at
>> this level which adds a further level of complexity.
>>
>> I also fear that this will not be a solution that works for every
>> scenario and if it's only one component that can't be cached
>> transparently by AO mechanisms we get a problem where to put it because
>> we wouldn't want to introduce a dependency on the caching module.
>>     
> Hmm, I don't meant real aop - I haven't looked into the caching pipeline
>   
Hmmmm, if not *real* AOP what then?
> impl, but I guess it checks each component if it implements the caching
> pipeline component interface. If the component does not implement it,
> the caching pipeline could try this default behaviour.
> Without thinking this through, I see two downsides: the cache key might
> contain stuff which is not required (this should be neglectable). The
> pipeline ends up to be always cacheable, regardless if the components
> itself support caching - this might be not the desired effect - I don't
> know :)
>   
I'm not sure I would like a system where I cannot "disable" a feature, 
even if I want it.
There might (iow. will)  be situations where  a result must be 
regenerated, even if all input parameters remained the same (just 
imagine using a CMS, Database, Index, etc. as input).
If the system keeps using the old result because the query for the DB 
did not change, you're screwed...

On the other hand, I guess it would be rather difficult to provide a 
caching mechanism with an optional module that can work with virtually 
any component that might exist.
Of course we can extract all the caching that exists now into a separate 
module but everything that needs caching would have to depend on it.

>   
>> If we want to clean up the cocoon-pipeline module, it's probably a
>> better idea to create a 'cocoon-sax' module and we move all SAX related
>> classes there. Then 'cocoon-pipeline' contains the core interfaces and
>> the pipeline implementations (incl. caching).
>>     
> I think we should do both :)
>   
I think we should definitely separate the SAX components from the 
pipeline. Especially with StAX coming.
I'm sure this will make the pipeline-module quite small and nice...


Regarding the whole topic
In your first mail you wrote: "there are too many interfaces which might 
confuse users"
Mind explaining what you mean?

IMO a user won't have to deal with many of the interfaces at all - even 
if (s)he uses the pipeline API directly (iow programmatically), not to 
mention when using the sitemap.
Someone who actually deals with the Cocoon code shouldn't have too much 
trouble - especially when comparing it to Cocoon 2.x.

IIRC this isn't the first time that someone said "Cocoon 3 is already 
too complicated because of too many/much ..." (e.g. interfaces, modules, 
complexity etc).
I mean Cocoon 2.x is at least 5 times as much as Cocoon 3 - no matter 
what kind of metric you use (I'm just guessing here, didn't actually 
measure).
So how come a considerably smaller/simpler approach is suddenly too 
complicated or too confusing?
> Carsten
>
>

Re: [C3] Caching

Posted by Carsten Ziegeler <cz...@apache.org>.

Reinhard Pötz wrote:
> I would only make the caching file generator available to the sitemap.
> If you put it into a caching pipeline, its caching interfaces will be
> regarded, if it is used in a noncaching pipeline, the
> CachingPipelineComponent interface will be disregarded.
Hmm, yes, sounds much simpler :)

> 
> I'm not sure if it is a good idea to introduce aspect orientation at
> this level which adds a further level of complexity.
> 
> I also fear that this will not be a solution that works for every
> scenario and if it's only one component that can't be cached
> transparently by AO mechanisms we get a problem where to put it because
> we wouldn't want to introduce a dependency on the caching module.
Hmm, I don't meant real aop - I haven't looked into the caching pipeline
impl, but I guess it checks each component if it implements the caching
pipeline component interface. If the component does not implement it,
the caching pipeline could try this default behaviour.
Without thinking this through, I see two downsides: the cache key might
contain stuff which is not required (this should be neglectable). The
pipeline ends up to be always cacheable, regardless if the components
itself support caching - this might be not the desired effect - I don't
know :)

> If we want to clean up the cocoon-pipeline module, it's probably a
> better idea to create a 'cocoon-sax' module and we move all SAX related
> classes there. Then 'cocoon-pipeline' contains the core interfaces and
> the pipeline implementations (incl. caching).
I think we should do both :)

Carsten

-- 
Carsten Ziegeler
cziegeler@apache.org

Re: [C3] Caching

Posted by Reinhard Pötz <re...@apache.org>.

Carsten Ziegeler wrote:
> Reinhard Pötz wrote:
>> Carsten Ziegeler wrote:
>>> Hi,
>>>
>>> I just started with looking closer at the pipeline stuff we have for c3
>>> atm. My first impression was that there are too many interfaces which
>>> might confuse users :) As we step away from just a sax based pipeline, I
>>> fear we really might need all these interfaces :(
>> I've already prepared a proposal in my drafts folder that if accepted
>> will introduce another interface ;-)
> Argh :)
> 
>> Nothing prevents us from having e.g. a FileGenerator and a
>> CacheableFileGenerator if we want to move all the caching stuff into its
>> own module. Or do I miss something?
> 
> Yes, that is an option which I immediately disregarded :) The problem I
> see is switching from non caching to caching. If you think of the old
> Cocoon sitemap, you define shorts names for the components, so you map
> "file" to the file generator. First question in this case is, which one
> (the caching or the non caching)? Do you want to define short names for
> both versions and then decide inside your pipeline? 

I would only make the caching file generator available to the sitemap.
If you put it into a caching pipeline, its caching interfaces will be
regarded, if it is used in a noncaching pipeline, the
CachingPipelineComponent interface will be disregarded.

> This is too
> complicated as you already have to choose the correct pipeline
> implementation.
> And we double the number of classes just to have something optional.

yes, that's the downside of my proposal.

> Perhaps we can come up with some auto-caching functionality? The full
> configuraion of a component is used to make up the cache key and if one
> of the parameters is a url we do the content modified check for this
> url. (Just a first brain dump, so this certainly needs some tweaking).
> E.g. the file generator does not need to implement any special stuff for
> caching - it just works. If a component needs special handling it can
> implement the optional CachingPipelineComponent.

I'm not sure if it is a good idea to introduce aspect orientation at
this level which adds a further level of complexity.

I also fear that this will not be a solution that works for every
scenario and if it's only one component that can't be cached
transparently by AO mechanisms we get a problem where to put it because
we wouldn't want to introduce a dependency on the caching module.

                                   - o -

If we want to clean up the cocoon-pipeline module, it's probably a
better idea to create a 'cocoon-sax' module and we move all SAX related
classes there. Then 'cocoon-pipeline' contains the core interfaces and
the pipeline implementations (incl. caching).

-- 
Reinhard Pötz                           Managing Director, {Indoqa} GmbH
                         http://www.indoqa.com/en/people/reinhard.poetz/

Member of the Apache Software Foundation
Apache Cocoon Committer, PMC member                  reinhard@apache.org
________________________________________________________________________

Re: [C3] Caching

Posted by Carsten Ziegeler <cz...@apache.org>.

Reinhard Pötz wrote:
> Carsten Ziegeler wrote:
>> Hi,
>>
>> I just started with looking closer at the pipeline stuff we have for c3
>> atm. My first impression was that there are too many interfaces which
>> might confuse users :) As we step away from just a sax based pipeline, I
>> fear we really might need all these interfaces :(
> 
> I've already prepared a proposal in my drafts folder that if accepted
> will introduce another interface ;-)
Argh :)

> 
> Nothing prevents us from having e.g. a FileGenerator and a
> CacheableFileGenerator if we want to move all the caching stuff into its
> own module. Or do I miss something?

Yes, that is an option which I immediately disregarded :) The problem I
see is switching from non caching to caching. If you think of the old
Cocoon sitemap, you define shorts names for the components, so you map
"file" to the file generator. First question in this case is, which one
(the caching or the non caching)? Do you want to define short names for
both versions and then decide inside your pipeline? This is too
complicated as you already have to choose the correct pipeline
implementation.
And we double the number of classes just to have something optional.

Perhaps we can come up with some auto-caching functionality? The full
configuraion of a component is used to make up the cache key and if one
of the parameters is a url we do the content modified check for this
url. (Just a first brain dump, so this certainly needs some tweaking).
E.g. the file generator does not need to implement any special stuff for
caching - it just works. If a component needs special handling it can
implement the optional CachingPipelineComponent.

Carsten
-- 
Carsten Ziegeler
cziegeler@apache.org

Re: [C3] Caching

Posted by Reinhard Pötz <re...@apache.org>.

Carsten Ziegeler wrote:
> Hi,
> 
> I just started with looking closer at the pipeline stuff we have for c3
> atm. My first impression was that there are too many interfaces which
> might confuse users :) As we step away from just a sax based pipeline, I
> fear we really might need all these interfaces :(

I've already prepared a proposal in my drafts folder that if accepted
will introduce another interface ;-)

> 
> The other thing is caching: i guess in many scenarios caching is not
> done on the pipeline or even pipeline component level. So I think we
> should try to make caching more optional - I know that it is optional
> from a feature point but I would like to have the pipeline jar as small
> as possible and move the caching stuff into an optional/additional lib.
> I have no good idea atm how to do this as the pipeline components itself
> need to be aware of caching. But perhaps someone else has?

Nothing prevents us from having e.g. a FileGenerator and a
CacheableFileGenerator if we want to move all the caching stuff into its
own module. Or do I miss something?

-- 
Reinhard Pötz                           Managing Director, {Indoqa} GmbH
                         http://www.indoqa.com/en/people/reinhard.poetz/

Member of the Apache Software Foundation
Apache Cocoon Committer, PMC member                  reinhard@apache.org
________________________________________________________________________

Re: [C3] Caching

Posted by Carsten Ziegeler <cz...@apache.org>.

Sylvain Wallez wrote:
> 
> I'm fearing OMS (Over Modularization Syndrome) here. Modularizing is
> good for either very different functional areas or external module
> dependencies. But in the case of caching, it is IMO a core feature of an
> efficient pipeline implementation even if there are some use cases where
> caching isn't useful or possible.
I'm not sure if in this case caching is really a core feature. Looking
back although we have the cool looking pipeline based caching feature
which queries all components in the pipeline etc., I rarely used this
and rather switched to a much simpler caching of the complete pipeline
which was expired based or triggered from the outside. This can be
easily layered on top. But ymmv of course, so I guess it will be
difficult to get a uniform opinion if caching belongs to the core or not  :)

Anyway, I think the current pipeline api at least looks very complex, so
reducing it to the essentials makes imho totally sense. And removing
caching from the core would remove a lot of stuff. But I'm fine to keep
it in the core if it doesn't add any additional dependencies to third
party libs. So I would like to be able to run the pipeline stuff with no
extra deps - of course if you want to use caching, you might use an
available cache implementation. And we should move all stuff related to
caching into a single package which imho creates a nicer overview.

Carsten

-- 
Carsten Ziegeler
cziegeler@apache.org

Re: [C3] Caching

Posted by Sylvain Wallez <sy...@apache.org>.

Carsten Ziegeler wrote:
> Hi,
>
> I just started with looking closer at the pipeline stuff we have for c3
> atm. My first impression was that there are too many interfaces which
> might confuse users :) As we step away from just a sax based pipeline, I
> fear we really might need all these interfaces :(
>
> The other thing is caching: i guess in many scenarios caching is not
> done on the pipeline or even pipeline component level. So I think we
> should try to make caching more optional - I know that it is optional
> from a feature point but I would like to have the pipeline jar as small
> as possible and move the caching stuff into an optional/additional lib.
> I have no good idea atm how to do this as the pipeline components itself
> need to be aware of caching. But perhaps someone else has?
>   

I'm fearing OMS (Over Modularization Syndrome) here. Modularizing is 
good for either very different functional areas or external module 
dependencies. But in the case of caching, it is IMO a core feature of an 
efficient pipeline implementation even if there are some use cases where 
caching isn't useful or possible.

So caching should be part of the core pipeline feature set, but the 
cache should be an optional dependency of the pipeline object (i.e. it 
can be null) for those use cases where caching doesn't make sense.

That way we keep a core concern into the core library, but do not 
require people to use it if they don't want to. But more importantly 
IMO, we avoid having a confusing multiplication of super-fine-grained 
libraries and also the increased architectural complexity needed to 
separate caching into an optional module.

Sylvain

-- 
Sylvain Wallez - http://bluxte.net