You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Joseph Shraibman <jk...@selectacast.net> on 2004/02/11 04:39:25 UTC
PERFORMANCE: XMLDocumentFragmentScannerImpl.scanAttribute()
I recently profiled a program of mine, and the top two stack traces were:
CPU SAMPLES BEGIN (total = 4019) Tue Feb 10 01:23:18 2004
rank self accum count trace method
1 17.54% 17.54% 705 4454 java.lang.String.<init>
2 17.52% 35.06% 704 4463 org.apache.xerces.xni.XMLString.toString
3 7.86% 42.92% 316 4459 java.lang.StringBuffer.toString
4 7.56% 50.49% 304 4458 java.lang.StringBuffer.<init>
5 7.51% 58.00% 302 4475
com.xtenit.xml.PathContentHandler.setCurrPath
6 7.39% 65.39% 297 4474 java.lang.StringBuffer.toString
7 6.87% 72.26% 276 4472 java.lang.StringBuffer.<init>
8 6.82% 79.07% 274 4468
com.xtenit.xml.PathContentHandler.setCurrPath
9 5.55% 84.62% 223 4469 java.lang.StringBuffer.expandCapacity
10 1.39% 86.02% 56 4473 java.lang.StringBuffer.expandCapacity
TRACE 4454:
java.lang.String.<init>(String.java:199)
org.apache.xerces.xni.XMLString.toString(<Unknown>:Unknown line)
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanAttribute(<Unknown>:Unknown
line)
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(<Unknown>:Unknown
line)
TRACE 4463:
org.apache.xerces.xni.XMLString.toString(<Unknown>:Unknown line)
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanAttribute(<Unknown>:Unknown
line)
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(<Unknown>:Unknown
line)
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(<Unknown>:Unknown
line)
=================================
I notice in scanAttribute():
scanAttributeValue(fTempString, fTempString2,
fAttributeQName.rawname, attributes,
attrIndex, isVC,fCurrentElement.rawname);
attributes.setValue(attrIndex, fTempString.toString());
attributes.setNonNormalizedValue(attrIndex,
fTempString2.toString());
The only time fTempString is not the same as fTempString2 is when the
value has either a non space whitespace char (\r\n\t) or there is an
entity (&string;). The vast majority of the time they are in fact the
same (at least with the xml I'm dealing with) so it seems to me we can
get rid of one of the two toString() calls.
There are two ways to do this:
1) in scanAttribute() compare the two XMLStrings.
Something like this:
scanAttributeValue(fTempString, fTempString2,
fAttributeQName.rawname, attributes,
attrIndex, isVC,fCurrentElement.rawname);
String string1 = fTempString.toString()
attributes.setValue(attrIndex, string1);
String string2 = fTempString2.equals(string1) ? string1 :
fTempString2.toString();
attributes.setNonNormalizedValue(attrIndex, string2);
- or -
2) have scanAttributeValue() return a boolean indicating if it found an
entity or a non-space whitespace char.
I think 2 might be faster, but 1 is easier to implement and makes the
code less messy. Thoughts?
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
Re: PERFORMANCE: XMLDocumentFragmentScannerImpl.scanAttribute()
Posted by Joseph Shraibman <jk...@selectacast.net>.
Michael Glavassevich wrote:
> Hello Joseph,
>
> Thanks for your feedback.
>
Thanks for replying. I had a problem with my mail folder and didn't see
your reply until now
<snip>
>
> As for your first suggestion, there are documents (SVG for instance) out
> in the world which have large attribute values. The parser may be
> iterating over two large strings only to determine that it still needs to
> create a new String object. This would degrade performance for such
> documents.
>
I've already tried that and there was a slight performance loss. I'll
try option number 2 when I get a chance.
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
Re: PERFORMANCE: XMLDocumentFragmentScannerImpl.scanAttribute()
Posted by Michael Glavassevich <mr...@apache.org>.
Hello Joseph,
Thanks for your feedback.
It would also be nice if we could avoid maintaining two XMLStringBuffers
when the normalized and non-normalized attribute values are equivalent.
We'd have to investigate whether tracking this is costly. We've
made a number imporvements to attribute processing in the last two
releases bu there's certainly room for more. I just noticed we're passing
in two parameters to scanAttributeValue which are never accessed by the
method.
As for your first suggestion, there are documents (SVG for instance) out
in the world which have large attribute values. The parser may be
iterating over two large strings only to determine that it still needs to
create a new String object. This would degrade performance for such
documents.
On Tue, 10 Feb 2004, Joseph Shraibman wrote:
> I recently profiled a program of mine, and the top two stack traces were:
>
>
> CPU SAMPLES BEGIN (total = 4019) Tue Feb 10 01:23:18 2004
> rank self accum count trace method
> 1 17.54% 17.54% 705 4454 java.lang.String.<init>
> 2 17.52% 35.06% 704 4463 org.apache.xerces.xni.XMLString.toString
> 3 7.86% 42.92% 316 4459 java.lang.StringBuffer.toString
> 4 7.56% 50.49% 304 4458 java.lang.StringBuffer.<init>
> 5 7.51% 58.00% 302 4475
> com.xtenit.xml.PathContentHandler.setCurrPath
> 6 7.39% 65.39% 297 4474 java.lang.StringBuffer.toString
> 7 6.87% 72.26% 276 4472 java.lang.StringBuffer.<init>
> 8 6.82% 79.07% 274 4468
> com.xtenit.xml.PathContentHandler.setCurrPath
> 9 5.55% 84.62% 223 4469 java.lang.StringBuffer.expandCapacity
> 10 1.39% 86.02% 56 4473 java.lang.StringBuffer.expandCapacity
>
> TRACE 4454:
> java.lang.String.<init>(String.java:199)
> org.apache.xerces.xni.XMLString.toString(<Unknown>:Unknown line)
>
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanAttribute(<Unknown>:Unknown
> line)
>
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(<Unknown>:Unknown
> line)
>
> TRACE 4463:
> org.apache.xerces.xni.XMLString.toString(<Unknown>:Unknown line)
>
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanAttribute(<Unknown>:Unknown
> line)
>
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(<Unknown>:Unknown
> line)
>
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(<Unknown>:Unknown
> line)
>
> =================================
>
> I notice in scanAttribute():
>
> scanAttributeValue(fTempString, fTempString2,
> fAttributeQName.rawname, attributes,
> attrIndex, isVC,fCurrentElement.rawname);
> attributes.setValue(attrIndex, fTempString.toString());
> attributes.setNonNormalizedValue(attrIndex,
> fTempString2.toString());
>
>
> The only time fTempString is not the same as fTempString2 is when the
> value has either a non space whitespace char (\r\n\t) or there is an
> entity (&string;). The vast majority of the time they are in fact the
> same (at least with the xml I'm dealing with) so it seems to me we can
> get rid of one of the two toString() calls.
>
> There are two ways to do this:
> 1) in scanAttribute() compare the two XMLStrings.
> Something like this:
>
> scanAttributeValue(fTempString, fTempString2,
> fAttributeQName.rawname, attributes,
> attrIndex, isVC,fCurrentElement.rawname);
> String string1 = fTempString.toString()
> attributes.setValue(attrIndex, string1);
> String string2 = fTempString2.equals(string1) ? string1 :
> fTempString2.toString();
> attributes.setNonNormalizedValue(attrIndex, string2);
>
> - or -
>
> 2) have scanAttributeValue() return a boolean indicating if it found an
> entity or a non-space whitespace char.
>
> I think 2 might be faster, but 1 is easier to implement and makes the
> code less messy. Thoughts?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
---------------------------
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org