You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-dev@xerces.apache.org by Joseph Shraibman <jk...@selectacast.net> on 2004/02/11 04:39:25 UTC

PERFORMANCE: XMLDocumentFragmentScannerImpl.scanAttribute()

I recently profiled a program of mine, and the top two stack traces were:


CPU SAMPLES BEGIN (total = 4019) Tue Feb 10 01:23:18 2004
rank   self  accum   count trace method
    1 17.54% 17.54%     705  4454 java.lang.String.<init>
    2 17.52% 35.06%     704  4463 org.apache.xerces.xni.XMLString.toString
    3  7.86% 42.92%     316  4459 java.lang.StringBuffer.toString
    4  7.56% 50.49%     304  4458 java.lang.StringBuffer.<init>
    5  7.51% 58.00%     302  4475 
com.xtenit.xml.PathContentHandler.setCurrPath
    6  7.39% 65.39%     297  4474 java.lang.StringBuffer.toString
    7  6.87% 72.26%     276  4472 java.lang.StringBuffer.<init>
    8  6.82% 79.07%     274  4468 
com.xtenit.xml.PathContentHandler.setCurrPath
    9  5.55% 84.62%     223  4469 java.lang.StringBuffer.expandCapacity
   10  1.39% 86.02%      56  4473 java.lang.StringBuffer.expandCapacity

TRACE 4454:
         java.lang.String.<init>(String.java:199)
         org.apache.xerces.xni.XMLString.toString(<Unknown>:Unknown line)
 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanAttribute(<Unknown>:Unknown 
line)
 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(<Unknown>:Unknown 
line)

TRACE 4463:
         org.apache.xerces.xni.XMLString.toString(<Unknown>:Unknown line)
 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanAttribute(<Unknown>:Unknown 
line)
 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(<Unknown>:Unknown 
line)
 
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(<Unknown>:Unknown 
line)

=================================

I notice in scanAttribute():

         scanAttributeValue(fTempString, fTempString2,
                            fAttributeQName.rawname, attributes,
                            attrIndex, isVC,fCurrentElement.rawname);
         attributes.setValue(attrIndex, fTempString.toString());
         attributes.setNonNormalizedValue(attrIndex, 
fTempString2.toString());


The only time fTempString is not the same as fTempString2 is when the 
value has either a non space whitespace char (\r\n\t) or there is an 
entity (&string;).  The vast majority of the time they are in fact the 
same (at least with the xml I'm dealing with) so it seems to me we can 
get rid of one of the two toString() calls.

There are two ways to do this:
1) in scanAttribute() compare the two XMLStrings.
Something like this:

         scanAttributeValue(fTempString, fTempString2,
                            fAttributeQName.rawname, attributes,
                            attrIndex, isVC,fCurrentElement.rawname);
	String string1 = fTempString.toString()
         attributes.setValue(attrIndex, string1);
	String string2 = fTempString2.equals(string1) ? string1 : 
fTempString2.toString();
         attributes.setNonNormalizedValue(attrIndex, string2);

- or -

2) have scanAttributeValue() return a boolean indicating if it found an 
entity or a non-space whitespace char.

I think 2 might be faster, but 1 is easier to implement and makes the 
code less messy.  Thoughts?

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

Re: PERFORMANCE: XMLDocumentFragmentScannerImpl.scanAttribute()

Posted by Joseph Shraibman <jk...@selectacast.net>.

Michael Glavassevich wrote:
> Hello Joseph,
> 
> Thanks for your feedback.
> 
Thanks for replying.  I had a problem with my mail folder and didn't see 
your reply until now

<snip>
> 
> As for your first suggestion, there are documents (SVG for instance) out
> in the world which have large attribute values. The parser may be
> iterating over two large strings only to determine that it still needs to
> create a new String object. This would degrade performance for such
> documents.
> 
I've already tried that and there was a slight performance loss.  I'll 
try option number 2 when I get a chance.

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

Re: PERFORMANCE: XMLDocumentFragmentScannerImpl.scanAttribute()

Posted by Michael Glavassevich <mr...@apache.org>.

Hello Joseph,

Thanks for your feedback.

It would also be nice if we could avoid maintaining two XMLStringBuffers
when the normalized and non-normalized attribute values are equivalent.
We'd have to investigate whether tracking this is costly. We've
made a number imporvements to attribute processing in the last two
releases bu there's certainly room for more. I just noticed we're passing
in two parameters to scanAttributeValue which are never accessed by the
method.

As for your first suggestion, there are documents (SVG for instance) out
in the world which have large attribute values. The parser may be
iterating over two large strings only to determine that it still needs to
create a new String object. This would degrade performance for such
documents.

On Tue, 10 Feb 2004, Joseph Shraibman wrote:

> I recently profiled a program of mine, and the top two stack traces were:
>
>
> CPU SAMPLES BEGIN (total = 4019) Tue Feb 10 01:23:18 2004
> rank   self  accum   count trace method
>     1 17.54% 17.54%     705  4454 java.lang.String.<init>
>     2 17.52% 35.06%     704  4463 org.apache.xerces.xni.XMLString.toString
>     3  7.86% 42.92%     316  4459 java.lang.StringBuffer.toString
>     4  7.56% 50.49%     304  4458 java.lang.StringBuffer.<init>
>     5  7.51% 58.00%     302  4475
> com.xtenit.xml.PathContentHandler.setCurrPath
>     6  7.39% 65.39%     297  4474 java.lang.StringBuffer.toString
>     7  6.87% 72.26%     276  4472 java.lang.StringBuffer.<init>
>     8  6.82% 79.07%     274  4468
> com.xtenit.xml.PathContentHandler.setCurrPath
>     9  5.55% 84.62%     223  4469 java.lang.StringBuffer.expandCapacity
>    10  1.39% 86.02%      56  4473 java.lang.StringBuffer.expandCapacity
>
> TRACE 4454:
>          java.lang.String.<init>(String.java:199)
>          org.apache.xerces.xni.XMLString.toString(<Unknown>:Unknown line)
>
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanAttribute(<Unknown>:Unknown
> line)
>
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(<Unknown>:Unknown
> line)
>
> TRACE 4463:
>          org.apache.xerces.xni.XMLString.toString(<Unknown>:Unknown line)
>
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanAttribute(<Unknown>:Unknown
> line)
>
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(<Unknown>:Unknown
> line)
>
> org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(<Unknown>:Unknown
> line)
>
> =================================
>
> I notice in scanAttribute():
>
>          scanAttributeValue(fTempString, fTempString2,
>                             fAttributeQName.rawname, attributes,
>                             attrIndex, isVC,fCurrentElement.rawname);
>          attributes.setValue(attrIndex, fTempString.toString());
>          attributes.setNonNormalizedValue(attrIndex,
> fTempString2.toString());
>
>
> The only time fTempString is not the same as fTempString2 is when the
> value has either a non space whitespace char (\r\n\t) or there is an
> entity (&string;).  The vast majority of the time they are in fact the
> same (at least with the xml I'm dealing with) so it seems to me we can
> get rid of one of the two toString() calls.
>
> There are two ways to do this:
> 1) in scanAttribute() compare the two XMLStrings.
> Something like this:
>
>          scanAttributeValue(fTempString, fTempString2,
>                             fAttributeQName.rawname, attributes,
>                             attrIndex, isVC,fCurrentElement.rawname);
> 	String string1 = fTempString.toString()
>          attributes.setValue(attrIndex, string1);
> 	String string2 = fTempString2.equals(string1) ? string1 :
> fTempString2.toString();
>          attributes.setNonNormalizedValue(attrIndex, string2);
>
> - or -
>
> 2) have scanAttributeValue() return a boolean indicating if it found an
> entity or a non-space whitespace char.
>
> I think 2 might be faster, but 1 is easier to implement and makes the
> code less messy.  Thoughts?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

---------------------------
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org