You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by "David Earlam (JIRA)" <xe...@xml.apache.org> on 2005/03/04 19:29:47 UTC

[jira] Updated: (XERCESC-1363) DataTypeListValidator extraordinarily slow for long lists

     [ http://issues.apache.org/jira/browse/XERCESC-1363?page=history ]

David Earlam updated XERCESC-1363:
----------------------------------

    Attachment: pq.zip

I've attached a zip file (it's only 5kB but expands to 500Kb). 

>  DataTypeListValidator extraordinarily slow  for long lists
> -----------------------------------------------------------
>
>          Key: XERCESC-1363
>          URL: http://issues.apache.org/jira/browse/XERCESC-1363
>      Project: Xerces-C++
>         Type: Bug
>   Components: Validating Parser (Schema) (Xerces 1.5 or up only)
>     Versions: 2.5.0, 2.6.0
>  Environment: Windows 2000
>     Reporter: David Earlam
>     Priority: Minor
>  Attachments: pq.zip
>
> Validating an XML instance against a Schema with an unbounded xsd:list type can take much greater than O(n) processing resources, where n is the number of items in the list.
> To reproduce use this Schema:
> pq.xsd
> <?xml version="1.0" encoding="utf-8" ?>
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> 	xmlns:pqns="http://swsis.cambridge.arm.com/~dearlam/xercestest/" targetNamespace="http://swsis.cambridge.arm.com/~dearlam/xercestest/"
> 	elementFormDefault="qualified" version="0.1">
> 	<xs:annotation>
> 		<xs:documentation xml:lang="en">
> 		XML schema for Hofstadter's Gödel pq-System.
> 		
> 		Test data for list data type validation.
> 	 </xs:documentation>
> 	</xs:annotation>
> 	<xs:element name="pqData" type="pqns:pqDataType"></xs:element>
> 	<xs:complexType name="pqDataType">
> 		<xs:complexContent>
> 			<xs:restriction base="xs:anyType">
> 				<xs:sequence minOccurs="1" maxOccurs="1">
> 					<xs:element name="dashes" type="pqns:dashBlockType"></xs:element>
> 					<xs:element name="p" type="xs:string" xsi:nill="true"></xs:element>
> 					<xs:element name="dashes" type="pqns:dashBlockType"></xs:element>
> 					<xs:element name="q" type="xs:string" xsi:nill="true"></xs:element>
> 					<xs:element name="dashes" type="pqns:dashBlockType"></xs:element>
> 				</xs:sequence>
> 			</xs:restriction>
> 		</xs:complexContent>
> 	</xs:complexType>
> 	<xs:complexType name="porqType">
> 		<xs:simpleContent>
> 			<xs:extension base="xs:string"></xs:extension>
> 		</xs:simpleContent>
> 	</xs:complexType>
> 	<xs:complexType name="dashBlockType">
> 		<xs:simpleContent>
> 			<xs:extension base="pqns:dataDashes"></xs:extension>
> 		</xs:simpleContent>
> 	</xs:complexType>
> 	<xs:simpleType name="Dash">
> 		<xs:restriction base="xs:string">
> 			<xs:pattern value="[\-]"></xs:pattern>
> 		</xs:restriction>
> 	</xs:simpleType>
> 	<xs:simpleType name="dataDashes">
> 		<xs:restriction base="pqns:DashList">
> 			<xs:minLength value="0" />
> 		</xs:restriction>
> 	</xs:simpleType>
> 	<xs:simpleType name="DashList">
> 		<xs:list itemType="pqns:Dash"></xs:list>
> 	</xs:simpleType>
> </xs:schema>
> and this XML file
> pqData0.xml
> <?xml version="1.0" encoding="utf-8" ?> 
> <pqData xmlns='http://swsis.cambridge.arm.com/~dearlam/xercestest/'
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:schemaLocation="http://swsis.cambridge.arm.com/~dearlam/xercestest/
>  http://swsis.cambridge.arm.com/~dearlam/xercestest/pq.xsd">
> <dashes>
> - -
> </dashes>
> <p/>
> <dashes>-</dashes>
> <q/>
> <dashes>-</dashes>
> </pqData>
> (replacing swsis.cambridge.arm.com/~dearlam/xercestest with your location)
> Then use 
>   domprint -wfpp=on pqData0.xml
> and
>   domprint -n -s -wfpp=on pqData0.xml
> to print the XML non-validating and validating.
> They print in equal short time. OK.
> Now, edit pqData0.xml as pqData1.xml and replace
> - - 
> with 4000 lines of
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> This gives a 500Kb file (which mimics my real data).
> If you then try
>   domprint -wfpp=on pqData1.xml
> and
>   domprint -n -s -wfpp=on pqData1.xml 
> the first prints instantly (pipe it to NUL if you like), but the second consumes 99% CPU for 230 seconds, then prints. 
> That's about 2 bytes per second !
> --
> (My suspicion is XMLString::tokenizeString is using subString() to calculate the string length
> way too many times...)
> kind regards,
> David

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org