You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by ap...@webdav.info on 2006/01/04 14:26:18 UTC

Xerces 250: enumeration performance problem

Dear Xerces committers,

I was evaluating the performance of the XML Schema enumeration language
element and did detect that it does not scale very well, when the number of
enumeration data is scaled within a single enumeration. All other tested
language elements do scale with the size of the input file (the xml
instance).

Here some figures:

   Numer of enums			total time

   SchemaEnum10		1	11937	 ms
   SchemaEnum100		3	39971	 ms
   SchemaEnum1000		129	1292987	 ms
   SchemaEnum10000	10256	102562157	 ms

Here the schemata:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified" attributeFormDefault="unqualified">
	<xs:simpleType name="DataType">
		<xs:restriction base="xs:NMTOKENS">
			<xs:enumeration value="1"/>
			<xs:enumeration value="2"/>
			<xs:enumeration value="3"/>
			<xs:enumeration value="4"/>
			<xs:enumeration value="5"/>
			<xs:enumeration value="6"/>
			<xs:enumeration value="7"/>
			<xs:enumeration value="8"/>
... proceeding until (10, 100, 1000, 1000)

Here the instance files:

<?xml version="1.0" encoding="UTF-8"?>
<Schema10>
   <Data>1</Data>
   <Data>2</Data>
   <Data>3</Data>
   <Data>4</Data>
   <Data>5</Data>
   <Data>6</Data>
... proceeding until (10, 100, 1000, 1000)


The reason for this N*N complexity lies in following method (Xerces 250):

package org.apache.xerces.impl.dv.xs;
public class XSSimpleTypeDecl implements XSSimpleType {
    private void checkFacets(ValidatedInfo validatedInfo) throws
InvalidDatatypeValueException {

....
        //enumeration
        if ( ((fFacetsDefined & FACET_ENUMERATION) != 0 ) ) {
            boolean present = false;
            for (int i = 0; i < fEnumeration.size(); i++) {
                if (fEnumeration.elementAt(i).equals(ob)) {
                    present = true;
                    break;
                }
            }
            if(!present){
                throw new
InvalidDatatypeValueException("cvc-enumeration-valid",
                                                        new Object []
{content, fEnumeration.toString()});
            }
        }

.....


The for-loop is executed N/2 times in average, if N is the number of
enumeration data elements. This for-loop causes the non scaling performance.

I would like to suggest to re-implement this part using a hastable instead
of a linear list. Would some committer be interested in implementing this
change? Is this the "right" mailing list for this issue?

I can provide more details if requested.


Best regards

Juergen Pill

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org