You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Stefano Mazzocchi <st...@apache.org> on 2002/10/06 19:30:11 UTC
[FYI] Profiling Cocoon...
Hello people,
I'm currently at Giacomo's place and we spent a rainy afternoon
profiling the latest Cocoon to see if there is something we could
fix/improve/blah-blah.
WARNING: this is *by no means* a scientific report. But we have tried to
be as informative as possible for developers.
We were running Tomcat 4.1.10 + Cocoon HEAD on Sun JDK 1.4.1-b21 on
linux, instrumented with Borland OptimizeIt 4.2.
Here is what we discovered:
1) Regarding memory leaks, Cocoon seems absolutely clean (for cocoon, we
mean org.apache.cocoon.* classes). Avalon seems to be clean as well.
Good job everyone.
2) we noticed an incredible use of
org.apache.avalon.excalibur.collections.BucketMap$Node. It is *by far*
the most used class in the heap. More than Strings, byte[], char[] and
int[]. Some 140000 instances of that class.
The number of bucketmap nodes grows linearly with the amount of
different pages accessed (as they are fed into the cache), but even a
cached resource creates some 44 new nodes, which are later garbage
collected.
44 is nothing compared to 140000, but still something to investigate.
So, discovery #1:
BucketMaps are used *a lot*. Be aware of this.
3) Catalina seems to be spending 10% of the pipeline time. Having
extensively profiled and carefully optimized a servlet engine (JServ) I
can tell you that this is *WAY* too much. Catalina doesn't seem like the
best choice to run a loaded servlet-based site (contact pier@apache.org
if you want to do something about it: he's working on Jerry, a
super-light servlet engine based on native APR and targetted expecially
for Apache 2.0)
4) java IO takes something from 20% to 35% of the entire request time
(reading and writing from the socket). This could well be a problem with
the instrumented JVM since I don't think the JDK 1.4 is that slow on IO
(expecially using the new NIO facilities internally)
5) most of the time is spent on:
a) XSLT processing (and we knew that)
b) DTD parsing (and that was surprise for me!)
Yeah, DTD parsing. No, not for validation, but for entity resolution. It
seems that even if the parser is non-validated, the DTD is fully parsed
anyway just to do entity evalutation.
So, discovery #2:
Be careful about DTDs even if the parser is not validating.
Of course, when the cache kicks in and the cached document is read
directly from the compiled SAX events, we have an incredible speed
improvement (also because entities are already resolved and hardwired).
6) Xalan incremental seems to be a little slower than regular Xalan, but
on multiprocessing machines this might not be the case [Xalan uses two
threads for incremental processing]
NOTE: Xalan doesn't pool threads when it does that!
So, while perceived performance is better for Xalan in incremental mode,
the overall load of the machine is reduced if Xalan is used normally.
7) XSLTC *IS* blazingly fast compared to Xalan and is much less resource
intensive.
Discovery #3:
use XSLTC as much as possible!
NOTE: our current root sitemap.xmap indicates that XSLTC is default XSLT
engine for Cocoon 2.1, but the fact is that the XSLTC factory is
commented out, resulting in running Xalan. We should either remove that
comment or uncomment the XSLTC factory.
I vote for making XSLTC default even if this generates a few bug reports.
8) Cocoon's hotspot is.... drum roll.... URI matching.
TreeProcessor is complex and adds lots of complexity to the call stacks,
but it seems to be very lightweight. It's URI matching that is the thing
that needs more work performance-wise.
Don't get me wrong, my numbers indicate that URI matching takes for 3%
to 8% of response time. Compared to the rest is nothing, but since this
is the only thing we are in total control, this is where we should
concentrate profiling efforts.
Ok, that's it. Enough for a rainy swiss afternoon.
Anyway, Cocoon is pretty optimized for what we could see. So let's be
happy about it.
--
Stefano Mazzocchi <st...@apache.org>
--------------------------------------------------------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
Re: [FYI] Profiling Cocoon...
Posted by "J.Pietschmann" <j3...@yahoo.de>.
Stefano Mazzocchi wrote:
> Yeah, DTD parsing. No, not for validation, but for entity resolution. It
> seems that even if the parser is non-validated, the DTD is fully parsed
> anyway just to do entity evalutation.
No surprise here, that's required by the spec, and obvioussly quite
right so.
*The* reason why I hate entities, and full DocBook.
J.Pietschmann
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
Re: [FYI] Profiling Cocoon...
Posted by Ivelin Ivanov <iv...@apache.org>.
Thank you for the profiling, Stefano !
... and
Welcome Back to The Party !
----- Original Message -----
From: "Stefano Mazzocchi" <st...@apache.org>
To: "Apache Cocoon" <co...@xml.apache.org>
Sent: Sunday, October 06, 2002 12:30 PM
Subject: [FYI] Profiling Cocoon...
pier@apache.org
> if you want to do something about it: he's working on Jerry, a
> super-light servlet engine based on native APR and targetted expecially
> for Apache 2.0)
A link to Jerry?
>
> 5) most of the time is spent on:
>
> a) XSLT processing (and we knew that)
> 7) XSLTC *IS* blazingly fast compared to Xalan and is much less resource
> intensive.
>
> Discovery #3:
>
> use XSLTC as much as possible!
Agreed. Sun's XSLTC team did a great job helping us to work through a lot of
bugs. XSLTC reached a stage that was good enough for many applications. At
least all the ones that I am using with Cocoon.
Therefore I made XSLTC the default engine. However there were reported
problems in some XSLT intense applications, and if I am not mistaken this
was the reason why XSLTC was switched out.
I absolutely agree with you that we should make XSLTC the default
transformer and continue working with Tom and the other XSLTC guys to fix
any outstanding bugs. It would be lovely if 2.1 ships with XSLTC.
I hope that Tom and Jacek are reading this message and will comment on the
current status of XSLTC and how it compares to Gregor now.
I think that XSLTC, Caching and the sitemap Expire attribute will make 2.1 a
very decent container for scalable applications.
BTW, do we have a document on the "expire" attribute already?
>
> NOTE: our current root sitemap.xmap indicates that XSLTC is default XSLT
> engine for Cocoon 2.1, but the fact is that the XSLTC factory is
> commented out, resulting in running Xalan. We should either remove that
> comment or uncomment the XSLTC factory.
>
> I vote for making XSLTC default even if this generates a few bug reports.
+10e+100
>
> Anyway, Cocoon is pretty optimized for what we could see. So let's be
> happy about it.
Hooray!
Ivelin
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
Re: [FYI] Profiling Cocoon...
Posted by Berin Loritsch <bl...@apache.org>.
Stefano Mazzocchi wrote:
> Hello people,
>
> 2) we noticed an incredible use of
> org.apache.avalon.excalibur.collections.BucketMap$Node. It is *by far*
> the most used class in the heap. More than Strings, byte[], char[] and
> int[]. Some 140000 instances of that class.
>
> The number of bucketmap nodes grows linearly with the amount of
> different pages accessed (as they are fed into the cache), but even a
> cached resource creates some 44 new nodes, which are later garbage
> collected.
>
> 44 is nothing compared to 140000, but still something to investigate.
>
> So, discovery #1:
>
> BucketMaps are used *a lot*. Be aware of this.
BucketMaps shine when high concurrency is an issue like in web environments.
It is what backs the ECM and Fortress.
BucketMaps are static in size, but the nodes are added when necessary.
During the transfer of ownership for the BucketMap to commons we found a
bug in the hashcode lookup. As the fix, I suggest upgrading to the commons
version (in CVS). The Excalibur code will be upgraded ASAP.
--
"They that give up essential liberty to obtain a little temporary safety
deserve neither liberty nor safety."
- Benjamin Franklin
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
Re: [FYI] Profiling Cocoon...
Posted by Stefano Mazzocchi <st...@apache.org>.
Sylvain Wallez wrote:
> IFAIK, bucketmaps are used as soon as a component is looked up, and
> getting a page from cache shouldn't reduce much the number of lookups
> since the pipeline has to be built to get the cache key and validity.
True, but who is 'creating' those new BucketMaps$Nodes who are later
garbage collected?
> <thinking-loudly>
> What could save some lookups is to have more ThreadSafe components,
> including pipeline components. For example, a generator could
> theroretically be threadsafe (it has mainly one generate() method), but
> the fact that setup() and generate() are separated currently prevents this.
>
> Also we have to consider that component lookup is more costly than
> instanciating a small object. Knowing this, some transformers and
> serializers can be thought of as factories of some lightweight content
> handlers that do the actual job. These transformers and serializers
> could then also be made ThreadSafe and thus avoid per-request lookup.
>
> This would require some new interfaces, which should coexist with the
> old ones to ensure backwards compatibility.
>
> Thoughts ?
I don't think lookup is that expensive.
It is true that the JVM is optimized for object creation and GC, but
extensive stress tests ran by one of the biggest cell phone companies in
europe (I can't tell you which one, sorry) showed significant pauses in
processing due to GC kicking in.
We have discovered thru profiling that each request handled by Cocoon
generates a big amount of garbage: these are all the payloads of the SAX
events that must be generated, passed along and GC at the end.
I've started to think on how we can recycle those objects, but I think
we are stretching the limit of what we can do inside the JVM since
pauses in JVM execution due to GC are probably a problem with the GC
algorithm rather than poor use of resources from our side.
Personally, I would not go thru back-incompatible changes in interfaces
just to avoid a few object lookups.
> </thinking-loudly>
>
>> 3) Catalina seems to be spending 10% of the pipeline time. Having
>> extensively profiled and carefully optimized a servlet engine (JServ)
>> I can tell you that this is *WAY* too much. Catalina doesn't seem like
>> the best choice to run a loaded servlet-based site (contact
>> pier@apache.org if you want to do something about it: he's working on
>> Jerry, a super-light servlet engine based on native APR and targetted
>> expecially for Apache 2.0)
>
>
>
> www.betaversion.org has been done for several weeks now...
Don't tell me: my mail went down the drain with it :/ You should mail
pier directly if you need more info on that.
> I'm happy to hear that :-) The TreeProcessor was designed to be as fast
> as possible, even if interpreted : pre-process everything that can be,
> and pre-lookup components when they're ThreadSafe. Call stacks can be
> impressive, but each frame performs very few computations.
Yep, profiling confirms that.
>> It's URI matching that is the thing that needs more work
>> performance-wise.
>>
>> Don't get me wrong, my numbers indicate that URI matching takes for 3%
>> to 8% of response time. Compared to the rest is nothing, but since
>> this is the only thing we are in total control, this is where we
>> should concentrate profiling efforts.
>
>
>
> Do you mean the WildcardURIMatcher ?
Yes.
> Is this related to the matching
> algorithm, or to the number of patterns that are to be tested for a
> typical request handling ?
Don't know. the profiler adds the time spent on each class and method
and sums them up. Anyway, that is the class in the org.apache.cocoon.*
namespace where most time is spent (on average)
>> Ok, that's it. Enough for a rainy swiss afternoon.
>>
>> Anyway, Cocoon is pretty optimized for what we could see. So let's be
>> happy about it.
>
>
>
> Have you compared 2.0.x and 2.1 respective speeds on the same
> application ?
No, but those performance tests should not be done with this level of
granularity but with some external load-stressing tools like Jmeter and
the like.
> This would be interesting to know if the 2.1 performs
> better than its ancestor.
Yes, most definately. Anyway willing to take the challenge?
--
Stefano Mazzocchi <st...@apache.org>
--------------------------------------------------------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
Re: [FYI] Profiling Cocoon...
Posted by Steven Noels <st...@outerthought.org>.
Stefano Mazzocchi wrote:
> yes and, admittedly, this sucks from a diversity of community perspective. But
> should I remind you that Xalan suffered more or less the same problem for at
> least 18 months?
Nope - and IMO, the *core* xml.apache.org tools are still quite heavily
'supported' by Sun or IBM. Dunnow how this is at the other side of the
pond (Jakarta, especially Tomcat).
Oh well - I'm in a rant mode these days:
http://radio.weblogs.com/0103539/2002/10/09.html#a37
>>, and I recently organized an XSLT seminar with Michael Kay who
>>was quite 'amused' w.r.t. XSLTC compliance & partial performance
>>optimalization of XSLTC. But he's obviously biased :-)
>
>
> Can you please elaborate more on this?
No bare facts to support this, sorry. It was something he muttered upon
my questions on XSLTC.
>>>Anyway, just a reminder: you never get people to scratch if you don't
>>>create some itches :)
>>
>>Would that be itches or just pet peeves? ;-)
>
>
> I think nobody here gives a damn about what XSLT engine they are using as long
> as it's fast and compliant. I'll leave ego fights to those who still enjoy them.
Hey, cool down ;-)
You and Ivelin have been advocating XSLTC for a long time - and I value
your effort doing so. But we are allowed to make jokes, no?
<snip/>
>>I believe we should definitely start warning people upfront that they
>>really should stick to release versions, instead of relying on CVS
>>checkouts of HEAD/2.1-dev - for some reason, there's quite some people
>>using CVS instead of our release version. But that's another rant.
>
>
> I think that a WARNING page is enough for people that want to try things out
> and know where we are heading and planning in advance. And I think they know
> very well the cost of rewriting things when something change under your feet.
> The use of open source software is partially because of that.
Nope, it's not enough. A lot of people are using daily CVS builds as
development/production infrastructure, which is good for bugtesting, but
also brings an enormous amount of 'this was working in CVS of
dd-mm-yyyy' mails to the list. They should be motivated to use stable
builds instead, maybe by backporting some of the nicer 2.1 HEAD features
to 2.0.3.
</Steven>
--
Steven Noels http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
stevenn@outerthought.org stevenn@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
Re: [FYI] Profiling Cocoon...
Posted by Stefano Mazzocchi <st...@apache.org>.
Quoting Steven Noels <st...@outerthought.org>:
> >> I'm not saying we shouldn't be bugtesting for XSLTC, it's just that I
> >> don't know if the XSLTC community will be there to follow up on our
> >> bug reports.
> >
> >
> > I hear you. Consider it a stress-test of both the software *and* the
> > community around it.
>
> I've been investigating
> http://marc.theaimsgroup.com/?l=xalan-cvs&s=xsltc a bit and it seems
> like there are some people actively working on it. Only Sun-people
> however
yes and, admittedly, this sucks from a diversity of community perspective. But
should I remind you that Xalan suffered more or less the same problem for at
least 18 months?
>, and I recently organized an XSLT seminar with Michael Kay who
> was quite 'amused' w.r.t. XSLTC compliance & partial performance
> optimalization of XSLTC. But he's obviously biased :-)
Can you please elaborate more on this?
> > Anyway, just a reminder: you never get people to scratch if you don't
> > create some itches :)
>
> Would that be itches or just pet peeves? ;-)
I think nobody here gives a damn about what XSLT engine they are using as long
as it's fast and compliant. I'll leave ego fights to those who still enjoy them.
> > And if this thing doesn't work out as expected, we can always ship
> > Cocoon 2.1 final with Xalan enabled.
> >
> > What do you think?
>
> Fair enough. We'll be a prime beta test site for both Avalon and XSLTC.
At one point, Sam Ruby was very puzzled by the ability of the cocoon community
to work with so many different projects and all of them on the bleeding edge
and still being able to manage not to piss off users every day.
That lead to the creation of gump which pretty much shows that that earlier
hidden contracts are made visibile, the solid the whole net of contracts become.
> I believe we should definitely start warning people upfront that they
> really should stick to release versions, instead of relying on CVS
> checkouts of HEAD/2.1-dev - for some reason, there's quite some people
> using CVS instead of our release version. But that's another rant.
I think that a WARNING page is enough for people that want to try things out
and know where we are heading and planning in advance. And I think they know
very well the cost of rewriting things when something change under your feet.
The use of open source software is partially because of that.
--
Stefano Mazzocchi <st...@apache.org>
------------------------------------------------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
Re: [FYI] Profiling Cocoon...
Posted by Steven Noels <st...@outerthought.org>.
Stefano Mazzocchi wrote:
> Correct, but this is not a good reason to have them run their
> well-thought stylesheets slower, don't you think?
Agree.
>> I'm not saying we shouldn't be bugtesting for XSLTC, it's just that I
>> don't know if the XSLTC community will be there to follow up on our
>> bug reports.
>
>
> I hear you. Consider it a stress-test of both the software *and* the
> community around it.
I've been investigating
http://marc.theaimsgroup.com/?l=xalan-cvs&s=xsltc a bit and it seems
like there are some people actively working on it. Only Sun-people
however, and I recently organized an XSLT seminar with Michael Kay who
was quite 'amused' w.r.t. XSLTC compliance & partial performance
optimalization of XSLTC. But he's obviously biased :-)
> Anyway, just a reminder: you never get people to scratch if you don't
> create some itches :)
Would that be itches or just pet peeves? ;-)
> And if this thing doesn't work out as expected, we can always ship
> Cocoon 2.1 final with Xalan enabled.
>
> What do you think?
Fair enough. We'll be a prime beta test site for both Avalon and XSLTC.
I believe we should definitely start warning people upfront that they
really should stick to release versions, instead of relying on CVS
checkouts of HEAD/2.1-dev - for some reason, there's quite some people
using CVS instead of our release version. But that's another rant.
Thanks for your analysis, BTW!
</Steven>
--
Steven Noels http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
stevenn@outerthought.org stevenn@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
Re: [FYI] Profiling Cocoon...
Posted by Stefano Mazzocchi <st...@apache.org>.
Steven Noels wrote:
>>> Discovery #3:
>>>
>>> use XSLTC as much as possible!
>>>
>>> NOTE: our current root sitemap.xmap indicates that XSLTC is default
>>> XSLT engine for Cocoon 2.1, but the fact is that the XSLTC factory is
>>> commented out, resulting in running Xalan. We should either remove
>>> that comment or uncomment the XSLTC factory.
>>>
>>> I vote for making XSLTC default even if this generates a few bug
>>> reports.
>>
>>
>>
>>
>> +1
>
>
> Do we have an active development community around XSLTC?
We don't, but Xalan does.
> Serverside XSLT processing is one of the key usage areas of Cocoon. I
> might be wrong on this, but making XSLTC the default, while it is still
> known as an incomplete XSLT 1.0 implementation, without the conformance
> tests ran ourselves that might prove the contrary, might as well be bad
> advertising for Cocoon.
My proposal is to turn it on on Cocoon 2.1 as it is in CVS HEAD and
maybe when we reach the first beta. Then see what happens.
> While the previous problems have been solved, it took some time and
> deliberation. Xalan2 is stable. The problem also might be that people
> are often misusing XSLT for things that could better be solved with
> plain Cocoon components.
Correct, but this is not a good reason to have them run their
well-thought stylesheets slower, don't you think?
> I'm not saying we shouldn't be bugtesting for XSLTC, it's just that I
> don't know if the XSLTC community will be there to follow up on our bug
> reports.
I hear you. Consider it a stress-test of both the software *and* the
community around it.
Anyway, just a reminder: you never get people to scratch if you don't
create some itches :)
And if this thing doesn't work out as expected, we can always ship
Cocoon 2.1 final with Xalan enabled.
What do you think?
--
Stefano Mazzocchi <st...@apache.org>
--------------------------------------------------------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
Re: [FYI] Profiling Cocoon...
Posted by Steven Noels <st...@outerthought.org>.
>> Discovery #3:
>>
>> use XSLTC as much as possible!
>>
>> NOTE: our current root sitemap.xmap indicates that XSLTC is default
>> XSLT engine for Cocoon 2.1, but the fact is that the XSLTC factory is
>> commented out, resulting in running Xalan. We should either remove
>> that comment or uncomment the XSLTC factory.
>>
>> I vote for making XSLTC default even if this generates a few bug reports.
>
>
>
> +1
Do we have an active development community around XSLTC?
Serverside XSLT processing is one of the key usage areas of Cocoon. I
might be wrong on this, but making XSLTC the default, while it is still
known as an incomplete XSLT 1.0 implementation, without the conformance
tests ran ourselves that might prove the contrary, might as well be bad
advertising for Cocoon.
While the previous problems have been solved, it took some time and
deliberation. Xalan2 is stable. The problem also might be that people
are often misusing XSLT for things that could better be solved with
plain Cocoon components.
I'm not saying we shouldn't be bugtesting for XSLTC, it's just that I
don't know if the XSLTC community will be there to follow up on our bug
reports.
</Steven>
--
Steven Noels http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
stevenn@outerthought.org stevenn@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
Re: [FYI] Profiling Cocoon...
Posted by Giacomo Pati <gi...@apache.org>.
On Sun, 6 Oct 2002, Sylvain Wallez wrote:
> Stefano Mazzocchi wrote:
>
> > Hello people,
> >
> > I'm currently at Giacomo's place and we spent a rainy afternoon
> > profiling the latest Cocoon to see if there is something we could
> > fix/improve/blah-blah.
> >
> > WARNING: this is *by no means* a scientific report. But we have tried
> > to be as informative as possible for developers.
> >
> > We were running Tomcat 4.1.10 + Cocoon HEAD on Sun JDK 1.4.1-b21 on
> > linux, instrumented with Borland OptimizeIt 4.2.
> >
> > Here is what we discovered:
> >
> > 1) Regarding memory leaks, Cocoon seems absolutely clean (for cocoon,
> > we mean org.apache.cocoon.* classes). Avalon seems to be clean as
> > well. Good job everyone.
> >
> > 2) we noticed an incredible use of
> > org.apache.avalon.excalibur.collections.BucketMap$Node. It is *by far*
> > the most used class in the heap. More than Strings, byte[], char[] and
> > int[]. Some 140000 instances of that class.
> >
> > The number of bucketmap nodes grows linearly with the amount of
> > different pages accessed (as they are fed into the cache), but even a
> > cached resource creates some 44 new nodes, which are later garbage
> > collected.
> >
> > 44 is nothing compared to 140000, but still something to investigate.
> >
> > So, discovery #1:
> >
> > BucketMaps are used *a lot*. Be aware of this.
>
>
> IFAIK, bucketmaps are used as soon as a component is looked up, and
> getting a page from cache shouldn't reduce much the number of lookups
> since the pipeline has to be built to get the cache key and validity.
>
> <thinking-loudly>
> What could save some lookups is to have more ThreadSafe components,
> including pipeline components. For example, a generator could
> theroretically be threadsafe (it has mainly one generate() method), but
> the fact that setup() and generate() are separated currently prevents this.
IIRC this is what Berin proposed long ago but never made it into the code.
I'm not sure we can unify those method without breaking cantracts.
> Also we have to consider that component lookup is more costly than
> instanciating a small object. Knowing this, some transformers and
> serializers can be thought of as factories of some lightweight content
> handlers that do the actual job. These transformers and serializers
> could then also be made ThreadSafe and thus avoid per-request lookup.
Ok.
> This would require some new interfaces, which should coexist with the
> old ones to ensure backwards compatibility.
probably.
Giacomo
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
Re: [FYI] Profiling Cocoon...
Posted by Sylvain Wallez <sy...@anyware-tech.com>.
Stefano Mazzocchi wrote:
> Hello people,
>
> I'm currently at Giacomo's place and we spent a rainy afternoon
> profiling the latest Cocoon to see if there is something we could
> fix/improve/blah-blah.
>
> WARNING: this is *by no means* a scientific report. But we have tried
> to be as informative as possible for developers.
>
> We were running Tomcat 4.1.10 + Cocoon HEAD on Sun JDK 1.4.1-b21 on
> linux, instrumented with Borland OptimizeIt 4.2.
>
> Here is what we discovered:
>
> 1) Regarding memory leaks, Cocoon seems absolutely clean (for cocoon,
> we mean org.apache.cocoon.* classes). Avalon seems to be clean as
> well. Good job everyone.
>
> 2) we noticed an incredible use of
> org.apache.avalon.excalibur.collections.BucketMap$Node. It is *by far*
> the most used class in the heap. More than Strings, byte[], char[] and
> int[]. Some 140000 instances of that class.
>
> The number of bucketmap nodes grows linearly with the amount of
> different pages accessed (as they are fed into the cache), but even a
> cached resource creates some 44 new nodes, which are later garbage
> collected.
>
> 44 is nothing compared to 140000, but still something to investigate.
>
> So, discovery #1:
>
> BucketMaps are used *a lot*. Be aware of this.
IFAIK, bucketmaps are used as soon as a component is looked up, and
getting a page from cache shouldn't reduce much the number of lookups
since the pipeline has to be built to get the cache key and validity.
<thinking-loudly>
What could save some lookups is to have more ThreadSafe components,
including pipeline components. For example, a generator could
theroretically be threadsafe (it has mainly one generate() method), but
the fact that setup() and generate() are separated currently prevents this.
Also we have to consider that component lookup is more costly than
instanciating a small object. Knowing this, some transformers and
serializers can be thought of as factories of some lightweight content
handlers that do the actual job. These transformers and serializers
could then also be made ThreadSafe and thus avoid per-request lookup.
This would require some new interfaces, which should coexist with the
old ones to ensure backwards compatibility.
Thoughts ?
</thinking-loudly>
> 3) Catalina seems to be spending 10% of the pipeline time. Having
> extensively profiled and carefully optimized a servlet engine (JServ)
> I can tell you that this is *WAY* too much. Catalina doesn't seem like
> the best choice to run a loaded servlet-based site (contact
> pier@apache.org if you want to do something about it: he's working on
> Jerry, a super-light servlet engine based on native APR and targetted
> expecially for Apache 2.0)
www.betaversion.org has been done for several weeks now...
> 4) java IO takes something from 20% to 35% of the entire request time
> (reading and writing from the socket). This could well be a problem
> with the instrumented JVM since I don't think the JDK 1.4 is that slow
> on IO (expecially using the new NIO facilities internally)
>
> 5) most of the time is spent on:
>
> a) XSLT processing (and we knew that)
> b) DTD parsing (and that was surprise for me!)
>
> Yeah, DTD parsing. No, not for validation, but for entity resolution.
> It seems that even if the parser is non-validated, the DTD is fully
> parsed anyway just to do entity evalutation.
>
> So, discovery #2:
>
> Be careful about DTDs even if the parser is not validating.
>
> Of course, when the cache kicks in and the cached document is read
> directly from the compiled SAX events, we have an incredible speed
> improvement (also because entities are already resolved and hardwired).
>
> 6) Xalan incremental seems to be a little slower than regular Xalan,
> but on multiprocessing machines this might not be the case [Xalan uses
> two threads for incremental processing]
>
> NOTE: Xalan doesn't pool threads when it does that!
>
> So, while perceived performance is better for Xalan in incremental
> mode, the overall load of the machine is reduced if Xalan is used
> normally.
>
> 7) XSLTC *IS* blazingly fast compared to Xalan and is much less
> resource intensive.
>
> Discovery #3:
>
> use XSLTC as much as possible!
>
> NOTE: our current root sitemap.xmap indicates that XSLTC is default
> XSLT engine for Cocoon 2.1, but the fact is that the XSLTC factory is
> commented out, resulting in running Xalan. We should either remove
> that comment or uncomment the XSLTC factory.
>
> I vote for making XSLTC default even if this generates a few bug reports.
+1
> 8) Cocoon's hotspot is.... drum roll.... URI matching.
>
> TreeProcessor is complex and adds lots of complexity to the call
> stacks, but it seems to be very lightweight.
I'm happy to hear that :-) The TreeProcessor was designed to be as fast
as possible, even if interpreted : pre-process everything that can be,
and pre-lookup components when they're ThreadSafe. Call stacks can be
impressive, but each frame performs very few computations.
> It's URI matching that is the thing that needs more work performance-wise.
>
> Don't get me wrong, my numbers indicate that URI matching takes for 3%
> to 8% of response time. Compared to the rest is nothing, but since
> this is the only thing we are in total control, this is where we
> should concentrate profiling efforts.
Do you mean the WildcardURIMatcher ? Is this related to the matching
algorithm, or to the number of patterns that are to be tested for a
typical request handling ?
> Ok, that's it. Enough for a rainy swiss afternoon.
>
> Anyway, Cocoon is pretty optimized for what we could see. So let's be
> happy about it.
Have you compared 2.0.x and 2.1 respective speeds on the same
application ? This would be interesting to know if the 2.1 performs
better than its ancestor.
Sylvain
--
Sylvain Wallez
Anyware Technologies Apache Cocoon
http://www.anyware-tech.com mailto:sylvain@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
Re: [FYI] Profiling Cocoon...
Posted by Giacomo Pati <gi...@apache.org>.
On Sun, 6 Oct 2002, Stefano Mazzocchi wrote:
> Hello people,
>
> I'm currently at Giacomo's place and we spent a rainy afternoon
> profiling the latest Cocoon to see if there is something we could
> fix/improve/blah-blah.
>
> WARNING: this is *by no means* a scientific report. But we have tried to
> be as informative as possible for developers.
>
> We were running Tomcat 4.1.10 + Cocoon HEAD on Sun JDK 1.4.1-b21 on
> linux, instrumented with Borland OptimizeIt 4.2.
>
> Here is what we discovered:
>
> 1) Regarding memory leaks, Cocoon seems absolutely clean (for cocoon, we
> mean org.apache.cocoon.* classes). Avalon seems to be clean as well.
> Good job everyone.
>
> 2) we noticed an incredible use of
> org.apache.avalon.excalibur.collections.BucketMap$Node. It is *by far*
> the most used class in the heap. More than Strings, byte[], char[] and
> int[]. Some 140000 instances of that class.
>
> The number of bucketmap nodes grows linearly with the amount of
> different pages accessed (as they are fed into the cache), but even a
> cached resource creates some 44 new nodes, which are later garbage
> collected.
>
> 44 is nothing compared to 140000, but still something to investigate.
>
> So, discovery #1:
>
> BucketMaps are used *a lot*. Be aware of this.
>
> 3) Catalina seems to be spending 10% of the pipeline time. Having
> extensively profiled and carefully optimized a servlet engine (JServ) I
> can tell you that this is *WAY* too much. Catalina doesn't seem like the
> best choice to run a loaded servlet-based site (contact pier@apache.org
> if you want to do something about it: he's working on Jerry, a
> super-light servlet engine based on native APR and targetted expecially
> for Apache 2.0)
>
> 4) java IO takes something from 20% to 35% of the entire request time
> (reading and writing from the socket). This could well be a problem with
> the instrumented JVM since I don't think the JDK 1.4 is that slow on IO
> (expecially using the new NIO facilities internally)
>
> 5) most of the time is spent on:
>
> a) XSLT processing (and we knew that)
> b) DTD parsing (and that was surprise for me!)
>
> Yeah, DTD parsing. No, not for validation, but for entity resolution. It
> seems that even if the parser is non-validated, the DTD is fully parsed
> anyway just to do entity evalutation.
>
> So, discovery #2:
>
> Be careful about DTDs even if the parser is not validating.
>
> Of course, when the cache kicks in and the cached document is read
> directly from the compiled SAX events, we have an incredible speed
> improvement (also because entities are already resolved and hardwired).
>
> 6) Xalan incremental seems to be a little slower than regular Xalan, but
> on multiprocessing machines this might not be the case [Xalan uses two
> threads for incremental processing]
>
> NOTE: Xalan doesn't pool threads when it does that!
>
> So, while perceived performance is better for Xalan in incremental mode,
> the overall load of the machine is reduced if Xalan is used normally.
>
> 7) XSLTC *IS* blazingly fast compared to Xalan and is much less resource
> intensive.
>
> Discovery #3:
>
> use XSLTC as much as possible!
>
> NOTE: our current root sitemap.xmap indicates that XSLTC is default XSLT
> engine for Cocoon 2.1, but the fact is that the XSLTC factory is
> commented out, resulting in running Xalan. We should either remove that
> comment or uncomment the XSLTC factory.
>
> I vote for making XSLTC default even if this generates a few bug reports.
+1
Giacomo
> 8) Cocoon's hotspot is.... drum roll.... URI matching.
>
> TreeProcessor is complex and adds lots of complexity to the call stacks,
> but it seems to be very lightweight. It's URI matching that is the thing
> that needs more work performance-wise.
>
> Don't get me wrong, my numbers indicate that URI matching takes for 3%
> to 8% of response time. Compared to the rest is nothing, but since this
> is the only thing we are in total control, this is where we should
> concentrate profiling efforts.
>
> Ok, that's it. Enough for a rainy swiss afternoon.
>
> Anyway, Cocoon is pretty optimized for what we could see. So let's be
> happy about it.
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
Re: [FYI] Profiling Cocoon...
Posted by Pier Fumagalli <pi...@apache.org>.
"Stefano Mazzocchi" <st...@apache.org> wrote:
> 3) Catalina seems to be spending 10% of the pipeline time. Having
> extensively profiled and carefully optimized a servlet engine (JServ) I
> can tell you that this is *WAY* too much. Catalina doesn't seem like the
> best choice to run a loaded servlet-based site (contact pier@apache.org
> if you want to do something about it: he's working on Jerry, a
> super-light servlet engine based on native APR and targetted expecially
> for Apache 2.0)
I _was_ working on it until legolas.betaversion.org died carrying all the
code to the grave... :-(
Pier
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org