You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@xerces.apache.org by "Arnold, Curt" <Cu...@hyprotech.com> on 2000/02/16 20:03:25 UTC

Xerces-C hot spot in XMLAttr::set

I did a profile of a SAX parse using Xerces 1.1.0 d05 and Numega's TrueTime and found what looks to be a very ripe opportunity for a substantial performance gain.

The most obvious hot-spot was XMLAttr::set() which accounted for 26% of the total time.  I've reproduced the code here:

void XMLAttr::set(  const   unsigned int        uriId
                    , const XMLCh* const        attrName
                    , const XMLCh* const        attrPrefix
                    , const XMLCh* const        attrValue
                    , const XMLAttDef::AttTypes type)
{
    // Clean up the old stuff
    delete [] fName;
    fName = 0;
    delete [] fPrefix;
    fPrefix = 0;
    delete [] fValue;
    fValue = 0;

    // And clean up the QName and leave it undone until asked for
    delete [] fQName;
    fQName = 0;

    // And now replicate the stuff into our members
    fType = type;
    fURIId = uriId;
    fName = XMLString::replicate(attrName);
    fPrefix = XMLString::replicate(attrPrefix);
    fValue = XMLString::replicate(attrValue);
}

XMLAttr seems to be called XMLAttr::XMLAttr() (10 times) and XMLScanner::scanStartTag() (4564 times) in my sample document.   57% of the time is spent in the operator []'s and 36% in the replicates.

Since attributes are generally about the same size, it would seem to be much more efficient to attempt to reuse the existing "stuff" instead of explicitly deleting it then reallocating it.  I don't
think that it requires anything as sophisticated as a string pool.  Maybe just always allocate fName (for example) at least, say, 64 characters.  If the new values is less than 64 characters, copy the
new value into fName otherwise allocate a longer fName.

Another anomaly is that I was not doing a validating parse, but I spent 10% of the time in DTDValidator::scanDTD().

I wish I could give some more insights, but I haven't found the trick to set breakpoints on code in Xerces Lib in VC6.  I think it has been covered here before, but I wasn't able to locate it.  If
someone wants to send me a pointer, that would be helpful.