You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by Jon Smirl <jo...@mediaone.net> on 2000/08/04 06:35:43 UTC

Xalan-C: performance

I spent several hours poking around in the code looking for performance
issues. I came to a single conclusion: DOMString is evil and it's got to go!
When measuring one of my test sheets Xalan created and destroyed 550,000
objects for a 40Kb source file. Well, 520,000 or more of those objects were
related to DOMString. The performance problems caused by DOMString are so
large they swamp any secondary problems.

Major evils:
1) two allocs for every string
2) no stack based version (it looks like it is stack based but it's still
doing alloc's)
3) multiple (4/5 or some calls) threading locks
4) ref counting
5) disagreement between Xalan and Xerces over NULL termination
6) Attempting to overlay shorter strings into longer ones.

While these may be good features in a DOMString they aren't what Xalan needs
in it's internal strings.

Changes in Xalan similar to this have been proposed many times (even IFDEF'd
in XalanDOMString.hpp) but something needs to be done so that DOMString
doesn't get spread further (ie into user's apps).  My proposal (I'll help do
this but it can't be done without the major coders agreeing) ...

1) switch XalanDOMString to basic_string or vector. Use Unicode chars.
2) define an internal, light weight, read-only/write-once DOM using the new
string
3) take SAX+ events from Xerces (or external program) to build this DOM
4) If given a Xerces DOM as input, run it, generate the events and build an
internal lightweight version. (Don't complain - copying it will be faster
than trying to use the Xerces one.)

Once we get the internal DOM other optimizations can be made such as
tracking space preserve via a bit in the DOM node instead of an attribute.
Another idea would be to support two builds, one based on wchar and one on
char for people that only deal in Latin1.

Jon Smirl
jonsmirl@mediaone.net


RE: Xalan-C: performance

Posted by Martin Paré <mp...@ordinox.com>.
I would tend to agree with Jon, I have raised this issue before (about using
std:string) and I think this would be a giant leap for Xalan to get over
this DOMString....

I would also like to point out that Xalan generates a tremendous amount of
page faults... I should probably come upi with specific numbers, but I would
think that this is also a cause for concern performance wise.

My 2 cents....

Best Regards
Martin Paré

-----Original Message-----
From: Jon Smirl [mailto:jonsmirl@mediaone.net]
Sent: Friday, August 04, 2000 12:36 AM
To: xalan
Subject: Xalan-C: performance


I spent several hours poking around in the code looking for performance
issues. I came to a single conclusion: DOMString is evil and it's got to go!
When measuring one of my test sheets Xalan created and destroyed 550,000
objects for a 40Kb source file. Well, 520,000 or more of those objects were
related to DOMString. The performance problems caused by DOMString are so
large they swamp any secondary problems.

Major evils:
1) two allocs for every string
2) no stack based version (it looks like it is stack based but it's still
doing alloc's)
3) multiple (4/5 or some calls) threading locks
4) ref counting
5) disagreement between Xalan and Xerces over NULL termination
6) Attempting to overlay shorter strings into longer ones.

While these may be good features in a DOMString they aren't what Xalan needs
in it's internal strings.

Changes in Xalan similar to this have been proposed many times (even IFDEF'd
in XalanDOMString.hpp) but something needs to be done so that DOMString
doesn't get spread further (ie into user's apps).  My proposal (I'll help do
this but it can't be done without the major coders agreeing) ...

1) switch XalanDOMString to basic_string or vector. Use Unicode chars.
2) define an internal, light weight, read-only/write-once DOM using the new
string
3) take SAX+ events from Xerces (or external program) to build this DOM
4) If given a Xerces DOM as input, run it, generate the events and build an
internal lightweight version. (Don't complain - copying it will be faster
than trying to use the Xerces one.)

Once we get the internal DOM other optimizations can be made such as
tracking space preserve via a bit in the DOM node instead of an attribute.
Another idea would be to support two builds, one based on wchar and one on
char for people that only deal in Latin1.

Jon Smirl
jonsmirl@mediaone.net