You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Carsten Ziegeler <cz...@s-und-n.de> on 2003/10/28 21:22:23 UTC

[IMP] Performance problems with TraxTransformer

While debugging/profiling a very big application for our
customer I found out that the current implementation
of the TraxTransformer is slowing down caching!

Why? Well, the caching algorithm asks every sitemap
component if the cached content is still valid. The
TraxTransformer answers this question by looking
if the stylesheet has changed since the last use 
(time stamp comparison).
So far so good, but you can have imports/includes
in your xslt, so the TraxTransformer checks them
as well - and this is done by "parsing" the
xslt and looking at all includes/imports. This
parsing is done, even when the content is fetched
out of the cache. 

Due to this mechanism, each stylesheet is parsed
on every request (if cached content is used or not)
which is in most cases unnecessary.
As we didn't use the "use-store" parameter of the
xslt transformer this is a real performance problem!

As the parsing is very time consuming, delivering
a cached content is still "slow". We had figures,
where a cached content took 1.5 sec (and producing
it from scratch took 1.8 sec).

With the recent changes we are down below 100ms for
delivering the cached content!
I added a "check-includes" configuration to the 
TraxTransformer. If you set it to "false" imported
stylesheet are not checked for changes for the 
caching, but you really feel the performance
difference.

So, you loose a little bit comfort but gain a lot
of performance improvements. And if you use it only
for production, it shouldn't be a problem anyway.
(The default is "as-is")

PS: The new feature will be released with 2.1.3 in
approx. two weeks.

Carsten


Re: [IMP] Performance problems with TraxTransformer

Posted by Antonio Gallardo <ag...@agsoftware.dnsalias.com>.
This change is very important we need to include it in a "What is new" for
2.1.3

Best Regards,

Antonio Gallardo

Carsten Ziegeler dijo:
> While debugging/profiling a very big application for our
> customer I found out that the current implementation
> of the TraxTransformer is slowing down caching!
>
> Why? Well, the caching algorithm asks every sitemap
> component if the cached content is still valid. The
> TraxTransformer answers this question by looking
> if the stylesheet has changed since the last use
> (time stamp comparison).
> So far so good, but you can have imports/includes
> in your xslt, so the TraxTransformer checks them
> as well - and this is done by "parsing" the
> xslt and looking at all includes/imports. This
> parsing is done, even when the content is fetched
> out of the cache.
>
> Due to this mechanism, each stylesheet is parsed
> on every request (if cached content is used or not)
> which is in most cases unnecessary.
> As we didn't use the "use-store" parameter of the
> xslt transformer this is a real performance problem!
>
> As the parsing is very time consuming, delivering
> a cached content is still "slow". We had figures,
> where a cached content took 1.5 sec (and producing
> it from scratch took 1.8 sec).
>
> With the recent changes we are down below 100ms for
> delivering the cached content!
> I added a "check-includes" configuration to the
> TraxTransformer. If you set it to "false" imported
> stylesheet are not checked for changes for the
> caching, but you really feel the performance
> difference.
>
> So, you loose a little bit comfort but gain a lot
> of performance improvements. And if you use it only
> for production, it shouldn't be a problem anyway.
> (The default is "as-is")
>
> PS: The new feature will be released with 2.1.3 in
> approx. two weeks.
>
> Carsten
>
>
> --------------------------------------------------------------------- To
> unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
> For additional commands, e-mail: users-help@cocoon.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: [IMP] Performance problems with TraxTransformer

Posted by Vadim Gritsenko <va...@verizon.net>.
Sylvain Wallez wrote:

> Vadim Gritsenko wrote:
>
>> Sylvain Wallez wrote:
>>
> <snip/>
>
>>> XSLTProcessor
>>> -------------
>>> This component's design is intrinsically bad from a cache 
>>> perspective: the only way to access validity is through 
>>> getTransformerHandlerAndValidity which always creates the 
>>> TransformerHandler even if we don't use it. Combine this with 
>>> use-store=false, and we end up reparsing the XSL at each call.
>>
>>
>>
>>
>> The only way to obtain validity is to get it from the store. If store 
>> is not present, the alternative is to *compute* validity, which 
>> involves XSLT parsing and results in templates object. It will be 
>> silly to compute validity and loose templates, that's why method 
>> returns both at once.
>>
>> If store is used, then templates are obtainer from the store for 
>> free, i.e. no CPU cycles used.
>
>
>
> Not exactly: the method returns a TransformerHandler, and not a 
> Templates (which serves as a factory to build TransformerHandlers). 
> And creating a TransformerHandler is a costly operation that is 
> useless when the pipeline is not executed and the result retrieved 
> from the cache.
>
> So a speed optimisation could consist in having 
> TransformerHandlerAndValidity create the handler lazily only when 
> requested, which would not occur if the pipeline is not executed.
>
> For this, we need TransformerHandlerAndValidity to hold the Templates 
> object.
>
> What do you think?


If creation of handler is costly -- then yes, it makes sense.

I feel like we are discussing it not the first time. Last time I was 
thinking about creation of pool of handlers per templates object, but 
this might be just too much :)

Vadim



Re: [IMP] Performance problems with TraxTransformer

Posted by Sylvain Wallez <sy...@apache.org>.
Vadim Gritsenko wrote:

> Sylvain Wallez wrote:
>
<snip/>

>> XSLTProcessor
>> -------------
>> This component's design is intrinsically bad from a cache 
>> perspective: the only way to access validity is through 
>> getTransformerHandlerAndValidity which always creates the 
>> TransformerHandler even if we don't use it. Combine this with 
>> use-store=false, and we end up reparsing the XSL at each call.
>
>
>
> The only way to obtain validity is to get it from the store. If store 
> is not present, the alternative is to *compute* validity, which 
> involves XSLT parsing and results in templates object. It will be 
> silly to compute validity and loose templates, that's why method 
> returns both at once.
>
> If store is used, then templates are obtainer from the store for free, 
> i.e. no CPU cycles used.


Not exactly: the method returns a TransformerHandler, and not a 
Templates (which serves as a factory to build TransformerHandlers). And 
creating a TransformerHandler is a costly operation that is useless when 
the pipeline is not executed and the result retrieved from the cache.

So a speed optimisation could consist in having 
TransformerHandlerAndValidity create the handler lazily only when 
requested, which would not occur if the pipeline is not executed.

For this, we need TransformerHandlerAndValidity to hold the Templates 
object.

What do you think?

Sylvain

-- 
Sylvain Wallez                                  Anyware Technologies
http://www.apache.org/~sylvain           http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Orixo, the opensource XML business alliance  -  http://www.orixo.com



Re: [IMP] Performance problems with TraxTransformer

Posted by Jeremy Quinn <je...@media.demon.co.uk>.
On Thursday, October 30, 2003, at 11:42 AM, Vadim Gritsenko wrote:

> Jeremy Quinn wrote:
>
>> On Thursday, October 30, 2003, at 02:45 AM, Vadim Gritsenko wrote:
>>
>>>> use-store
>>>
>>>
>>>> Let's switch it to true an ensure the transient store is really 
>>>> transient.
>>>
>>>
>>> +1. Store is checking for Serializable (again, IIRC), which should 
>>> be enough.
>>
>>
>> We turned this on on a production box recently, it became rather 
>> unstable, usually falling down once per day.
>
>
> Can you be little bit more specific in this part -- "failing down" -- 
> what have you seen?

:)

Sorry, Vadim, it is difficult to be more specific, because we never 
tracked down the exact cause (we were in 'panic mode').

One week I decided to try adding the config to store XSLTs.
I had tested it on a low-hit development machine, and it seemed to be 
faster.
Then we found the site started going down on a regular basis.
We would start seeing lots of Broken Pipes in the logs and TomCat would 
stop serving.
Restart TomCat and it would be down again by the end of the day.
Remove that config as (AFAIK, I am not on that job anymore) the problem 
has gone away.
The development server is running TomCat 4.1.18, Cocoon 2.1m2. Recent 
RedHat, Sun Java 1.4.1.02.

I could conceivably retrieve some archived logs if that would help, but 
they did not seem spectacularly illuminating at the time.

HTH

regards Jeremy


Re: [IMP] Performance problems with TraxTransformer

Posted by Vadim Gritsenko <va...@verizon.net>.
Jeremy Quinn wrote:

> On Thursday, October 30, 2003, at 02:45 AM, Vadim Gritsenko wrote:
>
>>> use-store
>>
>>
>>> Let's switch it to true an ensure the transient store is really 
>>> transient.
>>
>>
>> +1. Store is checking for Serializable (again, IIRC), which should be 
>> enough.
>
>
> We turned this on on a production box recently, it became rather 
> unstable, usually falling down once per day.


Can you be little bit more specific in this part -- "failing down" -- 
what have you seen?

Vadim



Re: [IMP] Performance problems with TraxTransformer

Posted by Jeremy Quinn <je...@media.demon.co.uk>.
On Thursday, October 30, 2003, at 02:45 AM, Vadim Gritsenko wrote:

>> use-store
>> ---------
>> why in hell is use-store to false??? IIRC, it was fist set to true 
>> because the transient store was actually not transient and tried to 
>> serialize compiled XSLTs in the persistant store, which failed 
>> because these objects are not serializable.
>
>
> History [1] says:
>
>    <parameter name="use-store" value="false"/> <!-- Setting this to 
> true will crash Cocoon for now! -->
>
> What this actually mean I'm not sure and I'm happily using "use-store" 
> parameter for a long time now (currently Cocoon 2.0.4 and 2.1)
>
>
>> Let's switch it to true an ensure the transient store is really 
>> transient.
>
>
> +1. Store is checking for Serializable (again, IIRC), which should be 
> enough.
>
>

We turned this on on a production box recently, it became rather 
unstable, usually falling down once per day. This was in Cocoon 2.1 m2.

regards Jeremy


Re: [IMP] Performance problems with TraxTransformer

Posted by Vadim Gritsenko <va...@verizon.net>.
Sylvain Wallez wrote:

> Carsten Ziegeler wrote:
>
>> Due to this mechanism, each stylesheet is parsed on every request (if 
>> cached content is used or not) which is in most cases unnecessary. As 
>> we didn't use the "use-store" parameter of the xslt transformer this 
>> is a real performance problem! 
>
>
> Is the reparsing always occuring, even if when "use-store" is "true"? 
> I guess not.


Your guess is right (IIRC).


> This was discussed a while ago, and we have here the combination of 
> two bug/deficiencies:
>
> use-store
> ---------
> why in hell is use-store to false??? IIRC, it was fist set to true 
> because the transient store was actually not transient and tried to 
> serialize compiled XSLTs in the persistant store, which failed because 
> these objects are not serializable.


History [1] says:

    <parameter name="use-store" value="false"/> <!-- Setting this to 
true will crash Cocoon for now! -->

What this actually mean I'm not sure and I'm happily using "use-store" 
parameter for a long time now (currently Cocoon 2.0.4 and 2.1)


> Let's switch it to true an ensure the transient store is really transient.


+1. Store is checking for Serializable (again, IIRC), which should be 
enough.


> XSLTProcessor
> -------------
> This component's design is intrinsically bad from a cache perspective: 
> the only way to access validity is through 
> getTransformerHandlerAndValidity which always creates the 
> TransformerHandler even if we don't use it. Combine this with 
> use-store=false, and we end up reparsing the XSL at each call.


The only way to obtain validity is to get it from the store. If store is 
not present, the alternative is to *compute* validity, which involves 
XSLT parsing and results in templates object. It will be silly to 
compute validity and loose templates, that's why method returns both at 
once.

If store is used, then templates are obtainer from the store for free, 
i.e. no CPU cycles used.

...

>> I added a "check-includes" configuration to the TraxTransformer. If 
>> you set it to "false" imported stylesheet are not checked for changes 
>> for the caching, but you really feel the performance difference.
>
>
> This way of solving the problem is hacky as it forces to choose 
> between speed and auto-reload and will often lead people to either not 
> understand why their changes are not taken into account or lead them 
> to choose the "secure" way by setting auto-releoad to false.
>
> We must refactor the XSLTProcessor so that:
> - it returns a MultiSourceValidity if needed (see in 
> o.a.c.c.source.impl in scratchpad).


This will require XSLT parsing, if above is correct.


> - getting the validity in the transient store is clearly separated 
> from getting the TransformerHandler.


If it is in the store, then templates are too. Unless you want to make 
validities persist in persistent store (templates are not serializable) 
-- in this case it makes sense to separate them so you do not loose 
validities on server (or CLI) restart. This should speed up CLI 
processing a bit.

Vadim

[1] 
http://cvs.apache.org/viewcvs.cgi/cocoon-2-historical/src/webapp/WEB-INF/cocoon.xconf?annotate=1.14



RE: [IMP] Performance problems with TraxTransformer

Posted by Carsten Ziegeler <cz...@s-und-n.de>.
Sylvain Wallez wrote
> 
> You know what? I started this refactoring on my HD at the time this 
> problem was raised, but never had the time to finish it...
> 
> >Anyway, I agree that refactoring the XSLTProcessor is a way to 
> go and that with useStore the problem is not that important.
> >
> >BUT, then even if the content is fetched from the cache, the 
> XSLT Processor is "activated" for the stylesheet, which is imho a 
> total overkill for finding out that the main stylesheet has 
> changed; so I still think this option is very very useful and 
> doesn't do any harm.
> >  
> >
> 
> Sorry, I don't understand "activated". Do you mean that a 
> TransformerHandler is created? That's exactly the design flaw I 
> pointed out.
Yes, exactly. I know that some time ago I refactored the TraxTransformer
so that the TransformerHandler was only created when it is really
needed. Then someone added the checking of the includes and
therefore the TransformerHandler is created every time which
is really a performance killer.

> 
> >Just make some speed comparision with and without the flag and 
> see if it helps you as well.
> >  
> >
> 
> I'm more than sure that it helps! But I wouldn't like a temporary 
> quick'n'dirty workaround go into a release...
> 
I don't think that it is quick and dirty. It's just a "performance
tuning" that says "don't check includes stylesheets during
caching". Even with the refactoring you mention this flag might
make sense. Without the flag instead of one single (main)
stylesheet, now the four/five included stylesheets are checked
as well which is obviously five times slower. That's my point.
The default for this flag is the old behviour anyway.

Carsten

Re: [IMP] Performance problems with TraxTransformer

Posted by Sylvain Wallez <sy...@apache.org>.
Carsten Ziegeler wrote:

>Sylvain Wallez wrote:
>  
>
>>Carsten Ziegeler wrote:
>>
>>    
>>
>>>While debugging/profiling a very big application for our
>>>      
>>>
>>customer I found out that the current implementation of the
>>TraxTransformer is slowing down caching!
>>    
>>
>>>Why? Well, the caching algorithm asks every sitemap component if
>>>      
>>>
>>the cached content is still valid. The TraxTransformer answers
>>this question by looking if the stylesheet has changed since the
>>last use (time stamp comparison).
>>    
>>
>>>So far so good, but you can have imports/includes in your xslt,
>>>      
>>>
>>so the TraxTransformer checks them as well - and this is done by
>>"parsing" the xslt and looking at all includes/imports. This
>>parsing is done, even when the content is fetched out of the cache.
>>    
>>
>>>Due to this mechanism, each stylesheet is parsed on every
>>>      
>>>
>>request (if cached content is used or not) which is in most cases
>>unnecessary. As we didn't use the "use-store" parameter of the
>>xslt transformer this is a real performance problem!
>>    
>>
>>>      
>>>
>>Is the reparsing always occuring, even if when "use-store" is "true"? I
>>guess not.
>>
>>This was discussed a while ago, and we have here the combination of two
>>bug/deficiencies:
>>
>>use-store
>>---------
>>why in hell is use-store to false??? IIRC, it was fist set to true
>>because the transient store was actually not transient and tried to
>>serialize compiled XSLTs in the persistant store, which failed because
>>these objects are not serializable.
>>
>>Let's switch it to true an ensure the transient store is really transient.
>>
>>XSLTProcessor
>>-------------
>>This component's design is intrinsically bad from a cache perspective:
>>the only way to access validity is through
>>getTransformerHandlerAndValidity which always creates the
>>TransformerHandler even if we don't use it. Combine this with
>>use-store=false, and we end up reparsing the XSL at each call.
>>
>>    
>>
>>>As the parsing is very time consuming, delivering a cached
>>>      
>>>
>>content is still "slow". We had figures, where a cached content
>>took 1.5 sec (and producing it from scratch took 1.8 sec).
>>    
>>
>>>With the recent changes we are down below 100ms for delivering
>>>      
>>>
>>the cached content! I added a "check-includes" configuration to
>>the TraxTransformer. If you set it to "false" imported stylesheet
>>are not checked for changes for the caching, but you really feel
>>the performance difference.
>>    
>>
>>>So, you loose a little bit comfort but gain a lot of performance
>>>      
>>>
>>improvements. And if you use it only for production, it shouldn't
>>be a problem anyway. (The default is "as-is")
>>    
>>
>>>      
>>>
>>This way of solving the problem is hacky as it forces to choose between
>>speed and auto-reload and will often lead people to either not
>>understand why their changes are not taken into account or lead them to
>>choose the "secure" way by setting auto-releoad to false.
>>
>>We must refactor the XSLTProcessor so that:
>>- it returns a MultiSourceValidity if needed (see in o.a.c.c.source.impl
>>in scratchpad).
>>- getting the validity in the transient store is clearly separated from
>>getting the TransformerHandler.
>>
>>    
>>
>>>PS: The new feature will be released with 2.1.3 in approx. two weeks.
>>>
>>>
>>>      
>>>
>>-1.
>>If you need the optimisation quickly for your customer, please make a
>>different class or keep it private until we do the clean refactoring.
>>
>>    
>>
>Which will never happen due to time constraints ...(just a joke).
>  
>

You know what? I started this refactoring on my HD at the time this 
problem was raised, but never had the time to finish it...

>Anyway, I agree that refactoring the XSLTProcessor is a way to go and that with useStore the problem is not that important.
>
>BUT, then even if the content is fetched from the cache, the XSLT Processor is "activated" for the stylesheet, which is imho a total overkill for finding out that the main stylesheet has changed; so I still think this option is very very useful and doesn't do any harm.
>  
>

Sorry, I don't understand "activated". Do you mean that a 
TransformerHandler is created? That's exactly the design flaw I pointed out.

>Just make some speed comparision with and without the flag and see if it helps you as well.
>  
>

I'm more than sure that it helps! But I wouldn't like a temporary 
quick'n'dirty workaround go into a release...

Sylvain

-- 
Sylvain Wallez                                  Anyware Technologies
http://www.apache.org/~sylvain           http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Orixo, the opensource XML business alliance  -  http://www.orixo.com



Re: [IMP] Performance problems with TraxTransformer

Posted by Joerg Heinicke <jh...@virbus.de>.
On 29.10.2003 10:09, Carsten Ziegeler wrote:

> Sylvain Wallez wrote:
> 
>>-1.
>>If you need the optimisation quickly for your customer, please make a 
>>different class or keep it private until we do the clean refactoring.
>>
> 
> I already committed it before I wrote the mail because I think it'S
> an important feature - but I can revert it. It's not important for
> our customer, it's just a thing that came up and we solved it differently
> anyway. But I think it's important for Cocoon.
> 
> Carsten

Why not doing the refactoring on FirstFriday? Til that day we can use 
Carsten's improvement.

Joerg


RE: [IMP] Performance problems with TraxTransformer

Posted by Carsten Ziegeler <cz...@s-und-n.de>.
Sylvain Wallez wrote:
> -1.
> If you need the optimisation quickly for your customer, please make a 
> different class or keep it private until we do the clean refactoring.
> 
I already committed it before I wrote the mail because I think it'S
an important feature - but I can revert it. It's not important for
our customer, it's just a thing that came up and we solved it differently
anyway. But I think it's important for Cocoon.

Carsten 

Re: [IMP] Performance problems with TraxTransformer

Posted by Vadim Gritsenko <va...@verizon.net>.
Carsten Ziegeler wrote:

>Sylvain Wallez wrote:
>  
>
>>Carsten Ziegeler wrote:
>>    
>>
...

>>I added a "check-includes" configuration to
>>the TraxTransformer. If you set it to "false" imported stylesheet
>>are not checked for changes for the caching, but you really feel
>>the performance difference.
>>    
>>

...

>>>PS: The new feature will be released with 2.1.3 in approx. two weeks.
>>>      
>>>
>>-1.
>>If you need the optimisation quickly for your customer, please make a
>>different class or keep it private until we do the clean refactoring.
>>
>>    
>>
>Which will never happen due to time constraints ...(just a joke).
>
>Anyway, I agree that refactoring the XSLTProcessor is a way to go and
>that with useStore the problem is not that important.
>  
>

It's important for CLI -- if you persist stylesheet validities.

Vadim



RE: [IMP] Performance problems with TraxTransformer

Posted by Carsten Ziegeler <cz...@s-und-n.de>.
Sylvain Wallez wrote:
>
> Carsten Ziegeler wrote:
>
> >While debugging/profiling a very big application for our
> customer I found out that the current implementation of the
> TraxTransformer is slowing down caching!
> >
> >Why? Well, the caching algorithm asks every sitemap component if
> the cached content is still valid. The TraxTransformer answers
> this question by looking if the stylesheet has changed since the
> last use (time stamp comparison).
> >So far so good, but you can have imports/includes in your xslt,
> so the TraxTransformer checks them as well - and this is done by
> "parsing" the xslt and looking at all includes/imports. This
> parsing is done, even when the content is fetched out of the cache.
> >
> >Due to this mechanism, each stylesheet is parsed on every
> request (if cached content is used or not) which is in most cases
> unnecessary. As we didn't use the "use-store" parameter of the
> xslt transformer this is a real performance problem!
> >
> >
>
> Is the reparsing always occuring, even if when "use-store" is "true"? I
> guess not.
>
> This was discussed a while ago, and we have here the combination of two
> bug/deficiencies:
>
> use-store
> ---------
> why in hell is use-store to false??? IIRC, it was fist set to true
> because the transient store was actually not transient and tried to
> serialize compiled XSLTs in the persistant store, which failed because
> these objects are not serializable.
>
> Let's switch it to true an ensure the transient store is really transient.
>
> XSLTProcessor
> -------------
> This component's design is intrinsically bad from a cache perspective:
> the only way to access validity is through
> getTransformerHandlerAndValidity which always creates the
> TransformerHandler even if we don't use it. Combine this with
> use-store=false, and we end up reparsing the XSL at each call.
>
> >As the parsing is very time consuming, delivering a cached
> content is still "slow". We had figures, where a cached content
> took 1.5 sec (and producing it from scratch took 1.8 sec).
> >
> >With the recent changes we are down below 100ms for delivering
> the cached content! I added a "check-includes" configuration to
> the TraxTransformer. If you set it to "false" imported stylesheet
> are not checked for changes for the caching, but you really feel
> the performance difference.
> >
> >So, you loose a little bit comfort but gain a lot of performance
> improvements. And if you use it only for production, it shouldn't
> be a problem anyway. (The default is "as-is")
> >
> >
>
> This way of solving the problem is hacky as it forces to choose between
> speed and auto-reload and will often lead people to either not
> understand why their changes are not taken into account or lead them to
> choose the "secure" way by setting auto-releoad to false.
>
> We must refactor the XSLTProcessor so that:
> - it returns a MultiSourceValidity if needed (see in o.a.c.c.source.impl
> in scratchpad).
> - getting the validity in the transient store is clearly separated from
> getting the TransformerHandler.
>
> >PS: The new feature will be released with 2.1.3 in approx. two weeks.
> >
> >
>
> -1.
> If you need the optimisation quickly for your customer, please make a
> different class or keep it private until we do the clean refactoring.
>
Which will never happen due to time constraints ...(just a joke).

Anyway, I agree that refactoring the XSLTProcessor is a way to go and
that with useStore the problem is not that important.

BUT, then even if the content is fetched from the cache, the XSLT
Processor is "activated" for the stylesheet, which is imho a total
overkill for finding out that the main stylesheet has changed; so
I still think this option is very very useful and doesn't do any
harm.
Just make some speed comparision with and without the flag and
see if it helps you as well.

Carsten


Re: [IMP] Performance problems with TraxTransformer

Posted by Sylvain Wallez <sy...@apache.org>.
Carsten Ziegeler wrote:

>While debugging/profiling a very big application for our customer I found out that the current implementation of the TraxTransformer is slowing down caching!
>
>Why? Well, the caching algorithm asks every sitemap component if the cached content is still valid. The TraxTransformer answers this question by looking if the stylesheet has changed since the last use (time stamp comparison).
>So far so good, but you can have imports/includes in your xslt, so the TraxTransformer checks them as well - and this is done by "parsing" the xslt and looking at all includes/imports. This parsing is done, even when the content is fetched out of the cache. 
>
>Due to this mechanism, each stylesheet is parsed on every request (if cached content is used or not) which is in most cases unnecessary. As we didn't use the "use-store" parameter of the xslt transformer this is a real performance problem!
>  
>

Is the reparsing always occuring, even if when "use-store" is "true"? I 
guess not.

This was discussed a while ago, and we have here the combination of two 
bug/deficiencies:

use-store
---------
why in hell is use-store to false??? IIRC, it was fist set to true 
because the transient store was actually not transient and tried to 
serialize compiled XSLTs in the persistant store, which failed because 
these objects are not serializable.

Let's switch it to true an ensure the transient store is really transient.

XSLTProcessor
-------------
This component's design is intrinsically bad from a cache perspective: 
the only way to access validity is through 
getTransformerHandlerAndValidity which always creates the 
TransformerHandler even if we don't use it. Combine this with 
use-store=false, and we end up reparsing the XSL at each call.

>As the parsing is very time consuming, delivering a cached content is still "slow". We had figures, where a cached content took 1.5 sec (and producing it from scratch took 1.8 sec).
>
>With the recent changes we are down below 100ms for delivering the cached content! I added a "check-includes" configuration to the TraxTransformer. If you set it to "false" imported stylesheet are not checked for changes for the caching, but you really feel the performance difference.
>
>So, you loose a little bit comfort but gain a lot of performance improvements. And if you use it only for production, it shouldn't be a problem anyway. (The default is "as-is")
>  
>

This way of solving the problem is hacky as it forces to choose between 
speed and auto-reload and will often lead people to either not 
understand why their changes are not taken into account or lead them to 
choose the "secure" way by setting auto-releoad to false.

We must refactor the XSLTProcessor so that:
- it returns a MultiSourceValidity if needed (see in o.a.c.c.source.impl 
in scratchpad).
- getting the validity in the transient store is clearly separated from 
getting the TransformerHandler.

>PS: The new feature will be released with 2.1.3 in approx. two weeks.
>  
>

-1.
If you need the optimisation quickly for your customer, please make a 
different class or keep it private until we do the clean refactoring.

Sylvain

-- 
Sylvain Wallez                                  Anyware Technologies
http://www.apache.org/~sylvain           http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Orixo, the opensource XML business alliance  -  http://www.orixo.com



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: [IMP] Performance problems with TraxTransformer

Posted by Sylvain Wallez <sy...@apache.org>.
Carsten Ziegeler wrote:

>While debugging/profiling a very big application for our customer I found out that the current implementation of the TraxTransformer is slowing down caching!
>
>Why? Well, the caching algorithm asks every sitemap component if the cached content is still valid. The TraxTransformer answers this question by looking if the stylesheet has changed since the last use (time stamp comparison).
>So far so good, but you can have imports/includes in your xslt, so the TraxTransformer checks them as well - and this is done by "parsing" the xslt and looking at all includes/imports. This parsing is done, even when the content is fetched out of the cache. 
>
>Due to this mechanism, each stylesheet is parsed on every request (if cached content is used or not) which is in most cases unnecessary. As we didn't use the "use-store" parameter of the xslt transformer this is a real performance problem!
>  
>

Is the reparsing always occuring, even if when "use-store" is "true"? I 
guess not.

This was discussed a while ago, and we have here the combination of two 
bug/deficiencies:

use-store
---------
why in hell is use-store to false??? IIRC, it was fist set to true 
because the transient store was actually not transient and tried to 
serialize compiled XSLTs in the persistant store, which failed because 
these objects are not serializable.

Let's switch it to true an ensure the transient store is really transient.

XSLTProcessor
-------------
This component's design is intrinsically bad from a cache perspective: 
the only way to access validity is through 
getTransformerHandlerAndValidity which always creates the 
TransformerHandler even if we don't use it. Combine this with 
use-store=false, and we end up reparsing the XSL at each call.

>As the parsing is very time consuming, delivering a cached content is still "slow". We had figures, where a cached content took 1.5 sec (and producing it from scratch took 1.8 sec).
>
>With the recent changes we are down below 100ms for delivering the cached content! I added a "check-includes" configuration to the TraxTransformer. If you set it to "false" imported stylesheet are not checked for changes for the caching, but you really feel the performance difference.
>
>So, you loose a little bit comfort but gain a lot of performance improvements. And if you use it only for production, it shouldn't be a problem anyway. (The default is "as-is")
>  
>

This way of solving the problem is hacky as it forces to choose between 
speed and auto-reload and will often lead people to either not 
understand why their changes are not taken into account or lead them to 
choose the "secure" way by setting auto-releoad to false.

We must refactor the XSLTProcessor so that:
- it returns a MultiSourceValidity if needed (see in o.a.c.c.source.impl 
in scratchpad).
- getting the validity in the transient store is clearly separated from 
getting the TransformerHandler.

>PS: The new feature will be released with 2.1.3 in approx. two weeks.
>  
>

-1.
If you need the optimisation quickly for your customer, please make a 
different class or keep it private until we do the clean refactoring.

Sylvain

-- 
Sylvain Wallez                                  Anyware Technologies
http://www.apache.org/~sylvain           http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Orixo, the opensource XML business alliance  -  http://www.orixo.com