You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cocoon.apache.org by Giacomo Pati <gi...@apache.org> on 2001/02/15 18:20:36 UTC

Re: [RT] sharing latest research production (long!)]

Stefano Mazzocchi wrote:
>
> Content Aggregation
> -------------------
>
> The future of the web is content aggregation: web services, application
> service providers, metadata providers, smart search engines, etc... all
> will be able to serve XML documents that will have to be first
> 'aggregated', then 'expanded', sometimes 'transformed' and finally
> 'serialized' to the client.
>
> How does this fit into Cocoon vision of pipeline components? I think
> pretty nicely.
>
> Suppose to have something like this:
>
>  <layout:layout>
>   <layout:area name="quotes">
>    <layout:content
>     xinclude:href="http://nasdaq.com/quotes.xml"
>     xinclude:namespace="http://nasdaq.com/quotes.xml"/>
>   </layout:area>
>   <layout:area name="email">
>    <layout:content xinclude:href="/webmail/newmail/count"/>
>   </layout:area>
>   <layout:area name="main">
>    <html:h2>Welcome to my pretty aggregated page</html:h2>
>    <html:p>blah blah</html:p>
>    ...
>   </layout:area>
>  </layout:layout>

Well, to be honest, there is nothing essentially new in the above (except 
namespacing). Still the old proposal (which is good IMHO anyway).

>
>                         ---------- o ----------
>
> Caching
> -------
>

Ok, this is quite a huge academic explanation of caching. I was sitting 
together with a friend of mine to analyze your formular etc.

>                                     - o -
>
> A general caching model for a single cacheable resource production is:
>
>       +-----(no)------------------> production --------------+
>
>  -> cache? -(yes)-> valid? -(yes)----------------> lookup ---+-->
>
>                       +-----(no)--> production --> storage --+
>
> Thus, given
>
>  r        := resource
>  t        := time of the request
>  V(r,t)   := validity estimation cost
>  P(r,t)   := production cost when caching is not performed
>  Pc(r,t)  := production cost when caching is performed
>  l(r,t)   := lookup cost
>  s(r,t)   := storage cost
>
> and
>
>  I := sequence of invalid requests time [1,n]
>  V := sequence of valid request time [1,m]
>
> we have that
>
>                n
>                --
>  Invalid(r) =  \  (V(r,I ) + Pc(r,I ) + s(r,I ))
>                /        i          i         i
>                --
>               i = 1
>
> and
>
>              m
>              --
>  Valid(r) =  \  (V(r,I ) + L(r,I ))
>              /        j         j
>              --
>            j = 1
>
> and
>
>             n                m
>             --               --
>  Ptot(r) =  \  P(r,I )   +   \  P(r,V )
>             /       i        /       j
>             --               --
>           i = 1             j = 1
>
> caching is efficient if and only if
>
>  Invalid(r) + Valid(r) < Ptot(r)              [1]

<snip/>

>  Ctot(r) = Valid(r) + Invalid(r)
>
> the cache efficiency for the given resource is given by
>
>  eff(r) = Ptot(r) - Ctot(r)

<snip/>

> At this point, it's not possible to go further unless a few property are
> assumed on the system. We will concentrate on systems where costs and
> ergodic periodicity exhibit time locality.
>
> This means that the cost of a state is presumed equal to the last known
> cost for that state and the ergodic periodicity of the resource equals
> the average ergodic periodicity exhibited until then.

With these properties (periodicity property is not needed, but we assume

P(r) = Pc(r)) we get:

Invalid(r)   = n(V(r) + P(r) + s(r))
         n,m
Valid(r)     = m(V(r) + l(r))
       n,m
eff(r)       = (n + m)P(r) - n(V(r) + P(r) + s(r)) - m(V(r) + l(r))
     n,m
             = mP(r) - (n + m)V(r) - ns(r) -ml(r)

Lets change to more common indicators:
       m
h := -----   (hits)
     n + m

f := n + m   (frequency, numer of requests)

and therefore

m = hf
n = (1 -h)f

eff(r)      = hfP(r) - fV(r) - (1 - h)fs(r) - hfl(r)
     f,h
            = f( h(P(r) + s(r) - l(r)) - V(r) - s(r) )

Now, let us assume that serialization (s(r)) and deserialization (l(r))
cost about the same and that the validity check is realy cheap.

eff(r)      = f ( hP(r) - s(r) )
     f,h

Where f, h, P(r), s(r) may change over time. We could write

eff(r, t)      = f(t) ( h(t)P(r) - s(r, t) )

Caching is effectiv only and only if eff(r) > 0, i.e.

    s(r)
h > ----
    P(r)

So, if you use slower caching memory (s(r) 'costs' more) then efficiency can
easily be checked.

Over all, eff(r) is a heuristic function over basic caching performance
indicators (hits, frequency, storing/retrieving and production), which
change over time. eff(r, t) takes into consideration the whole past history
of the resource (during the life cycle of the server). So, if a source
had a big peak in the morning, it stays effective for the rest of the day

Preferable would be a cost-function which only takes the last x
seconds/minutes/hours into account.

<snip/>

>
>       +----------------------------------------------------------+
>
>       | Result #2:                                               |
>       |
>       | Each cacheable producer must generate the unique key of  |
>       | the resource given all the enviornment information at    |
>       | request time                                             |
>       |
>       |   long generateKey(Enviornment e);                       |
>
>       +----------------------------------------------------------+
>
> Since this method is called synchronously it must be as fast as
> possible.
>
> 64 bits (long) guarantee uniqueness even for very big sites.

Uniqueness and speed can be solved by dividing the key into two parts where 
one part is generated by the component (ie. simple counter which is 
incremented whenever its state has changed) and the other part is generated 
by the cache system itself (also a simple counter which identifies the 
component).

>
>                                     - o -
>
> Once the key is obtained, the cache state information is queried and
> caching discrimination is performed using the algorithms described
> before.
>
> If caching is performed and the resource entry is present in the cache,
> ergodic estimation is performed by calling a method the cacheable
> producer must implement.
>
>       +----------------------------------------------------------+
>
>       | Result #3:                                               |
>       |
>       | Each cacheable producer must implement a method that     |
>       | evaluates ergodicity given the enviornment information   |
>       |
>       |   boolean stillValid(Enviornment e);                     |
>
>       +----------------------------------------------------------+
>
> This implements synchronous ergodic validation, but in some cases the
> producer is able to estimate the ergodic period in advance, allowing the
> cache system to avoid this call.

I'm missing an additional parameter to the method above (probabbly the key 
generated previously). I think "stillValid" must be compared to what state 
the cache system is thinking is valid.

<snip/>

>                                     - o -
>
> Assuming that memory represents an ordered collection of randomly
> accessible bytes, the act of 'storing a resource' and 'getting a
> resource' imply the act of 'serialization' and 'deserialization' of the
> resource.
>
> In many cases these operations are implicit, or done by the system
> (think about Java object serialization capabilities). For Cocoon2, the
> resources are represented as a stream of SAX events and the operation of
> serialization/deserialization must be carefully designed.
>
> A possible choice for such serialization/deserialization action is the
> XML compilation method I designed a while back. One objection on the use
> of this method for caching was it's specific asymmetry: SAX
> serialization (compilation) is slightly slower deserialization
> (interpretation).

Here you say "slightly slower" and this is why we have assumed in the 
formulas above that they can be ignored. They are (more or less) directly 
dependant of the amount of SAX events.

> After careful thinking, I believe that:
>
> 1) serialization/deserialization should be pluggable

Always good idea

> 2) the cache control system will adapt on the latency and automatically
> adjust the caching efficiency when the SAX serialization/deserialization
> code is not suited for that resource.
>
> This said, SAX compilation can be made faster by removing compression at
> the expense of using more cache memory.
>
<snip/>

> Before answering the question, let us observe that given a transformer
>
>  a ---> T ---> b
>
> the result of the production b is can be written as
>
>  b = T(a(t),t)
>
> assuming that the behavior of the function doesn't change with time
> (this would require code recompilation!) and separating the changeable
> parts, the above can be simplified in

This assumption might be acceptable for an XSLT transformer but almost every 
other specialized transformer can and will change over time (I18nTransformer, 
SQLTransformer).

I still need some ime to go over the rest of your proposal.

Giacomo

Re: [RT] sharing latest research production (long!)]

Posted by Paul Russell <pa...@luminas.co.uk>.

* Carsten Ziegeler (cziegeler@sundn.de) wrote :
> Ok, perhaps I should look more carefully at the logicsheets then. Is it possible
> to combine the logicsheets in a way that the "result" of one logicsheet is processed
> by another one? E.g. having the SQLTransformer and the I18nTransformer. The SQLTransformer
> fetches data from the database (with information for more than one language) and the
> I18nTransformer filters this result for the correct language. Is this possible with
> logicsheets? (I know that this example is a bit lame - but in lack of time its the first
> which came in mind).

Yep. For example, it is common to do:

<esql:query>
	SELECT foo FROM wobble WHERE something = <req:get-parameter ../>
</esql:query>

(don't quote me on the syntax, I've never used the ESQL logicsheet
myself)


P.

-- 
Paul Russell                                 Email:   paul@luminas.co.uk
Technical Director                             Tel:  +44 (0)20 8553 6622
Luminas Internet Applications                  Fax:  +44 (0)870 28 47489
This is not an official statement or order.    Web:    www.luminas.co.uk

Re: AW: [RT] sharing latest research production (long!)]

Posted by Stefano Mazzocchi <st...@apache.org>.

Paul Russell wrote:
> 
> * Carsten Ziegeler (cziegeler@sundn.de) wrote :
> > > <long snip>
> > > ..
> > > Stefano Mazzocchi wrote:
> > > > Giacomo Pati wrote
> > > > This assumption might be acceptable for an XSLT transformer but almost
> > > every
> > > > other specialized transformer can and will change over time
> > > (I18nTransformer,
> > > > SQLTransformer).
> > >
> > > Yes, this is why I suggest to place all these 'specialized'
> > > transformation capabilities at generation time.
> > >
> > Do you mean, not to have those transformers and instead have some corresponding generators?
> > But that would reduce the transformation stage of the pipeline to pure stylesheet transformation.
> 
> Not *all* translaters. I think stefano means to avoid situations where
> the transformation acts on commands contained within the content stream
> to retrieve content from elsewhere without using xinclude.

Yes, this is a good explaination.

The "external inclusion" is the key: a transformer that goes someplace
'outside cocoon' to grab content using the information that comes thru
the pipeline as SAX events is almost impossible to cache efficiently.

Why? because you'd have to pass the SAX events thru to let the
transformer tell you if the cached content is still valid or not.

The XIncludeFilter and the SQLTransformer do that.... but the
I18nTransformer does not (as far as I know, never looked into the code).

So, let me restate: if you write a transformer that 'goes outside' on a
location identified by SAX events coming in, then I suggest to turn it
into a generator.

Whick makes sense semantically: if fact you are not transforming
anything, but rather generating using XML configurations.

XInclude is the only exception to this rule and for this reason it will
implemented directly into the pipeline machinery so you get it for free
and don't have to care about xincluded fragment caching since the
sitemap will do that for you.

> I must admit,
> bits of this are slightly over my head -- I'm not entirely sure what
> Stefano means by atomic in this context, for example. Stefano, am I
> understanding you right?

Yes. 'Atomic' means that the transformer can answer the question 'have
you changed your behavior' without passing the input stream.

The SQLTransformer is not atomic because it needs the input stream to
understand where the database is, but I18NTransformer (even if it goes
out in an external i18n database) doesn't need the input to understand
if it's own behavior has changed or not.

> > And this in turn would reduce the flexibility of the pipeline:
> > How could I then generate a pipeline which uses more than one (former) transformer,
> > e.g. the SQLTransformer and the I18nTransformer? This would leed to a huge mess of generators
> > for all possible combinations.
> 
> Why would you use the SQL transformer? It's now fairly well deprecated
> in favour of the ESQL logicsheet. I'm not sure about the i18n
> transformer, because I've not looked at it in detail, but it may well be
> that we can create a logicsheet for that, too.

Please, do not misunderstand my words: I *NEVER* said that *ALL*
transformers should be turned into generators or taglibs, that would be
totally stupid and ruin the concept of pipelines alltogether.

I'm saying that transformers should be atomic as I previously defined to
guarantee caching efficency.

Is this a strong requirement? well, I don't think so: most non-atomic
trasnformers should be generators anyway and those 'real' transformers
are most automatically atomic.

Please, people, based your judgement on the full documents instead of a
few decontextualized fragments in reply.

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------

Re: [RT] sharing latest research production (long!)]

Posted by Carsten Ziegeler <cz...@sundn.de>.

> Paul Russell wrote:
> * Carsten Ziegeler (cziegeler@sundn.de) wrote :
>
> Not *all* translaters. I think stefano means to avoid situations where
> the transformation acts on commands contained within the content stream
> to retrieve content from elsewhere without using xinclude. I must admit,
> bits of this are slightly over my head -- I'm not entirely sure what
> Stefano means by atomic in this context, for example. Stefano, am I
> understanding you right?
> 
> > And this in turn would reduce the flexibility of the pipeline:
> > How could I then generate a pipeline which uses more than one (former)
> transformer,
> > e.g. the SQLTransformer and the I18nTransformer? This would leed to a
> huge mess of generators
> > for all possible combinations.
> 
> Why would you use the SQL transformer? It's now fairly well deprecated
> in favour of the ESQL logicsheet. I'm not sure about the i18n
> transformer, because I've not looked at it in detail, but it may well be
> that we can create a logicsheet for that, too.
Ok, perhaps I should look more carefully at the logicsheets then. Is it possible
to combine the logicsheets in a way that the "result" of one logicsheet is processed
by another one? E.g. having the SQLTransformer and the I18nTransformer. The SQLTransformer
fetches data from the database (with information for more than one language) and the
I18nTransformer filters this result for the correct language. Is this possible with
logicsheets? (I know that this example is a bit lame - but in lack of time its the first
which came in mind).


Carstem

Re: AW: [RT] sharing latest research production (long!)]

Posted by Paul Russell <pa...@luminas.co.uk>.

* Carsten Ziegeler (cziegeler@sundn.de) wrote :
> > <long snip>
> > ..
> > Stefano Mazzocchi wrote:
> > > Giacomo Pati wrote
> > > This assumption might be acceptable for an XSLT transformer but almost
> > every
> > > other specialized transformer can and will change over time
> > (I18nTransformer,
> > > SQLTransformer).
> > 
> > Yes, this is why I suggest to place all these 'specialized'
> > transformation capabilities at generation time.
> > 
> Do you mean, not to have those transformers and instead have some corresponding generators?
> But that would reduce the transformation stage of the pipeline to pure stylesheet transformation.

Not *all* translaters. I think stefano means to avoid situations where
the transformation acts on commands contained within the content stream
to retrieve content from elsewhere without using xinclude. I must admit,
bits of this are slightly over my head -- I'm not entirely sure what
Stefano means by atomic in this context, for example. Stefano, am I
understanding you right?

> And this in turn would reduce the flexibility of the pipeline:
> How could I then generate a pipeline which uses more than one (former) transformer, 
> e.g. the SQLTransformer and the I18nTransformer? This would leed to a huge mess of generators
> for all possible combinations.

Why would you use the SQL transformer? It's now fairly well deprecated
in favour of the ESQL logicsheet. I'm not sure about the i18n
transformer, because I've not looked at it in detail, but it may well be
that we can create a logicsheet for that, too.

P.
-- 
Paul Russell                                 Email:   paul@luminas.co.uk
Technical Director                             Tel:  +44 (0)20 8553 6622
Luminas Internet Applications                  Fax:  +44 (0)870 28 47489
This is not an official statement or order.    Web:    www.luminas.co.uk

AW: [RT] sharing latest research production (long!)]

Posted by Carsten Ziegeler <cz...@sundn.de>.

> <long snip>
> ..
> Stefano Mazzocchi wrote:
> > Giacomo Pati wrote
> > This assumption might be acceptable for an XSLT transformer but almost
> every
> > other specialized transformer can and will change over time
> (I18nTransformer,
> > SQLTransformer).
> 
> Yes, this is why I suggest to place all these 'specialized'
> transformation capabilities at generation time.
> 
Do you mean, not to have those transformers and instead have some corresponding generators?
But that would reduce the transformation stage of the pipeline to pure stylesheet transformation.

And this in turn would reduce the flexibility of the pipeline:
How could I then generate a pipeline which uses more than one (former) transformer, 
e.g. the SQLTransformer and the I18nTransformer? This would leed to a huge mess of generators
for all possible combinations.

Is this what you meant - or did I get something wrong?

Carsten

Re: [RT] sharing latest research production (long!)]

Posted by Stefano Mazzocchi <st...@apache.org>.

Giacomo Pati wrote:
> 
> Stefano Mazzocchi wrote:
> >
> > Content Aggregation
> > -------------------
> >
> > The future of the web is content aggregation: web services, application
> > service providers, metadata providers, smart search engines, etc... all
> > will be able to serve XML documents that will have to be first
> > 'aggregated', then 'expanded', sometimes 'transformed' and finally
> > 'serialized' to the client.
> >
> > How does this fit into Cocoon vision of pipeline components? I think
> > pretty nicely.
> >
> > Suppose to have something like this:
> >
> >  <layout:layout>
> >   <layout:area name="quotes">
> >    <layout:content
> >     xinclude:href="http://nasdaq.com/quotes.xml"
> >     xinclude:namespace="http://nasdaq.com/quotes.xml"/>
> >   </layout:area>
> >   <layout:area name="email">
> >    <layout:content xinclude:href="/webmail/newmail/count"/>
> >   </layout:area>
> >   <layout:area name="main">
> >    <html:h2>Welcome to my pretty aggregated page</html:h2>
> >    <html:p>blah blah</html:p>
> >    ...
> >   </layout:area>
> >  </layout:layout>
> 
> Well, to be honest, there is nothing essentially new in the above (except
> namespacing). Still the old proposal (which is good IMHO anyway).

Yes, correct, but I believe namespacing will place an important role in
the future, expecially when you'll aggregate external resources you
can't control.
 
> >
> >                         ---------- o ----------
> >
> > Caching
> > -------
> >
> 
> Ok, this is quite a huge academic explanation of caching. I was sitting
> together with a friend of mine to analyze your formular etc.
> 
> >                                     - o -
> >
> > A general caching model for a single cacheable resource production is:
> >
> >       +-----(no)------------------> production --------------+
> >
> >  -> cache? -(yes)-> valid? -(yes)----------------> lookup ---+-->
> >
> >                       +-----(no)--> production --> storage --+
> >
> > Thus, given
> >
> >  r        := resource
> >  t        := time of the request
> >  V(r,t)   := validity estimation cost
> >  P(r,t)   := production cost when caching is not performed
> >  Pc(r,t)  := production cost when caching is performed
> >  l(r,t)   := lookup cost
> >  s(r,t)   := storage cost
> >
> > and
> >
> >  I := sequence of invalid requests time [1,n]
> >  V := sequence of valid request time [1,m]
> >
> > we have that
> >
> >                n
> >                --
> >  Invalid(r) =  \  (V(r,I ) + Pc(r,I ) + s(r,I ))
> >                /        i          i         i
> >                --
> >               i = 1
> >
> > and
> >
> >              m
> >              --
> >  Valid(r) =  \  (V(r,I ) + L(r,I ))
> >              /        j         j
> >              --
> >            j = 1
> >
> > and
> >
> >             n                m
> >             --               --
> >  Ptot(r) =  \  P(r,I )   +   \  P(r,V )
> >             /       i        /       j
> >             --               --
> >           i = 1             j = 1
> >
> > caching is efficient if and only if
> >
> >  Invalid(r) + Valid(r) < Ptot(r)              [1]
> 
> <snip/>
> 
> >  Ctot(r) = Valid(r) + Invalid(r)
> >
> > the cache efficiency for the given resource is given by
> >
> >  eff(r) = Ptot(r) - Ctot(r)
> 
> <snip/>
> 
> > At this point, it's not possible to go further unless a few property are
> > assumed on the system. We will concentrate on systems where costs and
> > ergodic periodicity exhibit time locality.
> >
> > This means that the cost of a state is presumed equal to the last known
> > cost for that state and the ergodic periodicity of the resource equals
> > the average ergodic periodicity exhibited until then.
> 
> With these properties (periodicity property is not needed, but we assume
> 
> P(r) = Pc(r)) we get:

Oh, this might not always be true, be careful.
 
> Invalid(r)   = n(V(r) + P(r) + s(r))
>          n,m

Careful, you are moving into average values! How do you know them? How
do you know these values have a variance that is small enough for your
calculations to hold true?

I used sums because the only thing you know is the value in that moment.
Everything else must adapt using feedback.

> Valid(r)     = m(V(r) + l(r))
>        n,m
> eff(r)       = (n + m)P(r) - n(V(r) + P(r) + s(r)) - m(V(r) + l(r))
>      n,m
>              = mP(r) - (n + m)V(r) - ns(r) -ml(r)
> 
> Lets change to more common indicators:
>        m
> h := -----   (hits)
>      n + m
> 
> f := n + m   (frequency, numer of requests)
> 
> and therefore
> 
> m = hf
> n = (1 -h)f
> 
> eff(r)      = hfP(r) - fV(r) - (1 - h)fs(r) - hfl(r)
>      f,h
>             = f( h(P(r) + s(r) - l(r)) - V(r) - s(r) )
> 
> Now, let us assume that serialization (s(r)) and deserialization (l(r))
> cost about the same and that the validity check is realy cheap.
> 
> eff(r)      = f ( hP(r) - s(r) )
>      f,h
> 
> Where f, h, P(r), s(r) may change over time. We could write
> 
> eff(r, t)      = f(t) ( h(t)P(r) - s(r, t) )
> 
> Caching is effectiv only and only if eff(r) > 0, i.e.
> 
>     s(r)
> h > ----
>     P(r)
> 
> So, if you use slower caching memory (s(r) 'costs' more) then efficiency can
> easily be checked.

This is probably much easier to understand, but it hides *many*
statistical holes and assumes total knowledge of all parameters (and
their statistical properties) at all time.

Also, it assumes that s(r) and P(r) can be determined independently,
which is rarely the case, expecially when serialization on disk is
performed "during event production" not afterwords.

> Over all, eff(r) is a heuristic function over basic caching performance
> indicators (hits, frequency, storing/retrieving and production)

Even if based on averages, this treatment has the big merit to show that
the concept of 'hits and frequency' is already contained in the model,
even if implicitly. This answers Robin's questions and shows that adding
an explicit dependency on frequency would break the balance between the
different performance indicators.

>, which
> change over time. eff(r, t) takes into consideration the whole past history
> of the resource (during the life cycle of the server). So, if a source
> had a big peak in the morning, it stays effective for the rest of the day
> 
> Preferable would be a cost-function which only takes the last x
> seconds/minutes/hours into account.

This is a good point: use 'windowed efficiencies' instead of 'globally
integral' ones.

Hmmm,  I have to think about this more... probably I'll test these
concepts in the 'virtual cache' I'll implement over the weekend.
 
> <snip/>
> 
> >
> >       +----------------------------------------------------------+
> >
> >       | Result #2:                                               |
> >       |
> >       | Each cacheable producer must generate the unique key of  |
> >       | the resource given all the enviornment information at    |
> >       | request time                                             |
> >       |
> >       |   long generateKey(Enviornment e);                       |
> >
> >       +----------------------------------------------------------+
> >
> > Since this method is called synchronously it must be as fast as
> > possible.
> >
> > 64 bits (long) guarantee uniqueness even for very big sites.
> 
> Uniqueness and speed can be solved by dividing the key into two parts where
> one part is generated by the component (ie. simple counter which is
> incremented whenever its state has changed) and the other part is generated
> by the cache system itself (also a simple counter which identifies the
> component).

Yeah, look at my previous email.

[...snip...]
 
> Here you say "slightly slower" and this is why we have assumed in the
> formulas above that they can be ignored. They are (more or less) directly
> dependant of the amount of SAX events.

Not always. There are many compression algorithms that are much less
expensive during decompression than during compression, this is because
entropy estimation is normally expensive and requires computation that
is not done during decompression. The kind of entropy estimation that my
SAX compilation code performs is ridiculous (saves index for those
element names that already passed thru, instead of strings), but it does
have a small price.

If we apply more complex algorithms (imagine an Huffman coding on
element names, along with Lempel-Ziv coding after a small-block
Barrow-Wheeler transformation on the char streams) the expense of
caching results significant, while it doesn't hurt that much peformance
for deserialization.

Yes, one big objection is that memory cost will always be cheaper than
computation cost, thus it's much easier to add RAM rather than CPUs,
this removes the need for compression during serialization, thus
balances the costs of serialization and deserialization.
 
> > After careful thinking, I believe that:
> >
> > 1) serialization/deserialization should be pluggable
> 
> Always good idea
> 
> > 2) the cache control system will adapt on the latency and automatically
> > adjust the caching efficiency when the SAX serialization/deserialization
> > code is not suited for that resource.
> >
> > This said, SAX compilation can be made faster by removing compression at
> > the expense of using more cache memory.
> >
> <snip/>
> 
> > Before answering the question, let us observe that given a transformer
> >
> >  a ---> T ---> b
> >
> > the result of the production b is can be written as
> >
> >  b = T(a(t),t)
> >
> > assuming that the behavior of the function doesn't change with time
> > (this would require code recompilation!) and separating the changeable
> > parts, the above can be simplified in
> 
> This assumption might be acceptable for an XSLT transformer but almost every
> other specialized transformer can and will change over time (I18nTransformer,
> SQLTransformer).

Yes, this is why I suggest to place all these 'specialized'
transformation capabilities at generation time.

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------