You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by ap...@webdav.info on 2006/01/04 14:26:18 UTC
Xerces 250: enumeration performance problem
Dear Xerces committers,
I was evaluating the performance of the XML Schema enumeration language
element and did detect that it does not scale very well, when the number of
enumeration data is scaled within a single enumeration. All other tested
language elements do scale with the size of the input file (the xml
instance).
Here some figures:
Numer of enums total time
SchemaEnum10 1 11937 ms
SchemaEnum100 3 39971 ms
SchemaEnum1000 129 1292987 ms
SchemaEnum10000 10256 102562157 ms
Here the schemata:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:simpleType name="DataType">
<xs:restriction base="xs:NMTOKENS">
<xs:enumeration value="1"/>
<xs:enumeration value="2"/>
<xs:enumeration value="3"/>
<xs:enumeration value="4"/>
<xs:enumeration value="5"/>
<xs:enumeration value="6"/>
<xs:enumeration value="7"/>
<xs:enumeration value="8"/>
... proceeding until (10, 100, 1000, 1000)
Here the instance files:
<?xml version="1.0" encoding="UTF-8"?>
<Schema10>
<Data>1</Data>
<Data>2</Data>
<Data>3</Data>
<Data>4</Data>
<Data>5</Data>
<Data>6</Data>
... proceeding until (10, 100, 1000, 1000)
The reason for this N*N complexity lies in following method (Xerces 250):
package org.apache.xerces.impl.dv.xs;
public class XSSimpleTypeDecl implements XSSimpleType {
private void checkFacets(ValidatedInfo validatedInfo) throws
InvalidDatatypeValueException {
....
//enumeration
if ( ((fFacetsDefined & FACET_ENUMERATION) != 0 ) ) {
boolean present = false;
for (int i = 0; i < fEnumeration.size(); i++) {
if (fEnumeration.elementAt(i).equals(ob)) {
present = true;
break;
}
}
if(!present){
throw new
InvalidDatatypeValueException("cvc-enumeration-valid",
new Object []
{content, fEnumeration.toString()});
}
}
.....
The for-loop is executed N/2 times in average, if N is the number of
enumeration data elements. This for-loop causes the non scaling performance.
I would like to suggest to re-implement this part using a hastable instead
of a linear list. Would some committer be interested in implementing this
change? Is this the "right" mailing list for this issue?
I can provide more details if requested.
Best regards
Juergen Pill
---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org