You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by "Leonid G (JIRA)" <xe...@xml.apache.org> on 2009/03/20 22:39:50 UTC

[jira] Created: (XERCESC-1857) unique/key constraint on non string field/attriibute slows validation 6x times vs string attriibute

unique/key constraint on non string field/attriibute slows validation 6x times vs string attriibute
---------------------------------------------------------------------------------------------------

                 Key: XERCESC-1857
                 URL: https://issues.apache.org/jira/browse/XERCESC-1857
             Project: Xerces-C++
          Issue Type: Bug
          Components: Validating Parser (XML Schema)
    Affects Versions: 3.0.1
         Environment: Linux, Red Hat Enterprise Linux ES release 4 (Nahant Update 4), gcc 3.4.6, 32 bits.
            Reporter: Leonid G


I have prepared a fairly small schema with 'xs:unique' constraint on nested element.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
  <xs:element name="el1">
    <xs:element name="el2">
      <xs:element name="el3">
                        <xs:attribute name="key"                 type="xs:string" use="required"/>
<!--                        <xs:attribute name="key"                 type="xs:unsignedInt" use="required"/>-->
      </xs:element name>

        <xs:unique name="uniqCreativeKey">
              <xs:selector xpath="cam:creative"/>
              <xs:field    xpath="@key"/>
         </xs:unique>
    </xs:element name>
  </xs:element name>
</xs:schema>

I need to validate a relatively large xml (~ 100Kb) against this schema.

Load + validation take about 10s and process consumes ~100Mb of memory. I am using SAX, not DOM.
If I change schema definition slightly to make type of "unique" attribute to be unsignedInt instead of string - than code runs ~ 1min:10 sec (7 times slower) and also uses about 100Mb of RAM.

Thus I have two questions:
1. Why parser requires 100Mb of RAM to load and validate 100Kb file 
2. Why changing attribute definition type from xs:string to xs:unsignedInt slows validation 7 times?

I have built xerces myself from sources.
./configure --disable-shared --enable-netaccessor-socket --enable-msgloader-inmemory --prefix=/mnt/builder/3rdParty/xml/xerces-c/3.0.1 --libdir=/mnt/builder/3rdParty/xml/xerces-c/3.0.1/lib.gcc-3.4.6


I can provide schema, xml and code that I am running.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


[jira] Updated: (XERCESC-1857) unique/key constraint on non string field/attriibute slows validation 6x times vs string attriibute

Posted by "Leonid G (JIRA)" <xe...@xml.apache.org>.
     [ https://issues.apache.org/jira/browse/XERCESC-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leonid G updated XERCESC-1857:
------------------------------

    Attachment: xml_validation_test.zip

attached archive contains c++ source for my test, schema and xml with data.

> unique/key constraint on non string field/attriibute slows validation 6x times vs string attriibute
> ---------------------------------------------------------------------------------------------------
>
>                 Key: XERCESC-1857
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1857
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Validating Parser (XML Schema)
>    Affects Versions: 3.0.1
>         Environment: Linux, Red Hat Enterprise Linux ES release 4 (Nahant Update 4), gcc 3.4.6, 32 bits.
>            Reporter: Leonid G
>         Attachments: xml_validation_test.zip
>
>
> I have prepared a fairly small schema with 'xs:unique' constraint on nested element.
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
>   <xs:element name="el1">
>     <xs:element name="el2">
>       <xs:element name="el3">
>                         <xs:attribute name="key"                 type="xs:string" use="required"/>
> <!--                        <xs:attribute name="key"                 type="xs:unsignedInt" use="required"/>-->
>       </xs:element name>
>         <xs:unique name="uniqCreativeKey">
>               <xs:selector xpath="cam:creative"/>
>               <xs:field    xpath="@key"/>
>          </xs:unique>
>     </xs:element name>
>   </xs:element name>
> </xs:schema>
> I need to validate a relatively large xml (~ 100Kb) against this schema.
> Load + validation take about 10s and process consumes ~100Mb of memory. I am using SAX, not DOM.
> If I change schema definition slightly to make type of "unique" attribute to be unsignedInt instead of string - than code runs ~ 1min:10 sec (7 times slower) and also uses about 100Mb of RAM.
> Thus I have two questions:
> 1. Why parser requires 100Mb of RAM to load and validate 100Kb file 
> 2. Why changing attribute definition type from xs:string to xs:unsignedInt slows validation 7 times?
> I have built xerces myself from sources.
> ./configure --disable-shared --enable-netaccessor-socket --enable-msgloader-inmemory --prefix=/mnt/builder/3rdParty/xml/xerces-c/3.0.1 --libdir=/mnt/builder/3rdParty/xml/xerces-c/3.0.1/lib.gcc-3.4.6
> I can provide schema, xml and code that I am running.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


[jira] Commented: (XERCESC-1857) unique/key constraint on non string field/attriibute slows validation 6x times vs string attriibute

Posted by "David Bertoni (JIRA)" <xe...@xml.apache.org>.
    [ https://issues.apache.org/jira/browse/XERCESC-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12685370#action_12685370 ] 

David Bertoni commented on XERCESC-1857:
----------------------------------------

Yes, please zip up the schema files, the XML document, and the code so someone can investigate.

> unique/key constraint on non string field/attriibute slows validation 6x times vs string attriibute
> ---------------------------------------------------------------------------------------------------
>
>                 Key: XERCESC-1857
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1857
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Validating Parser (XML Schema)
>    Affects Versions: 3.0.1
>         Environment: Linux, Red Hat Enterprise Linux ES release 4 (Nahant Update 4), gcc 3.4.6, 32 bits.
>            Reporter: Leonid G
>         Attachments: xml_validation_test.zip
>
>
> I have prepared a fairly small schema with 'xs:unique' constraint on nested element.
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
>   <xs:element name="el1">
>     <xs:element name="el2">
>       <xs:element name="el3">
>                         <xs:attribute name="key"                 type="xs:string" use="required"/>
> <!--                        <xs:attribute name="key"                 type="xs:unsignedInt" use="required"/>-->
>       </xs:element name>
>         <xs:unique name="uniqCreativeKey">
>               <xs:selector xpath="cam:creative"/>
>               <xs:field    xpath="@key"/>
>          </xs:unique>
>     </xs:element name>
>   </xs:element name>
> </xs:schema>
> I need to validate a relatively large xml (~ 100Kb) against this schema.
> Load + validation take about 10s and process consumes ~100Mb of memory. I am using SAX, not DOM.
> If I change schema definition slightly to make type of "unique" attribute to be unsignedInt instead of string - than code runs ~ 1min:10 sec (7 times slower) and also uses about 100Mb of RAM.
> Thus I have two questions:
> 1. Why parser requires 100Mb of RAM to load and validate 100Kb file 
> 2. Why changing attribute definition type from xs:string to xs:unsignedInt slows validation 7 times?
> I have built xerces myself from sources.
> ./configure --disable-shared --enable-netaccessor-socket --enable-msgloader-inmemory --prefix=/mnt/builder/3rdParty/xml/xerces-c/3.0.1 --libdir=/mnt/builder/3rdParty/xml/xerces-c/3.0.1/lib.gcc-3.4.6
> I can provide schema, xml and code that I am running.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


[jira] Resolved: (XERCESC-1857) unique/key constraint on non string field/attriibute slows validation 6x times vs string attriibute

Posted by "Alberto Massari (JIRA)" <xe...@xml.apache.org>.
     [ https://issues.apache.org/jira/browse/XERCESC-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alberto Massari resolved XERCESC-1857.
--------------------------------------

       Resolution: Fixed
    Fix Version/s: 3.1.0
         Assignee: Alberto Massari

The excessive memory usage has been fixed by rev 708224 of the trunk (see http://svn.apache.org/viewvc?view=rev&revision=708224).
In general, comparing numbers instead of strings requires parsing the literal values into decimal numbers and then comparing them, instead of a single string comparison.

> unique/key constraint on non string field/attriibute slows validation 6x times vs string attriibute
> ---------------------------------------------------------------------------------------------------
>
>                 Key: XERCESC-1857
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1857
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Validating Parser (XML Schema)
>    Affects Versions: 3.0.1
>         Environment: Linux, Red Hat Enterprise Linux ES release 4 (Nahant Update 4), gcc 3.4.6, 32 bits.
>            Reporter: Leonid Gershanovich
>            Assignee: Alberto Massari
>             Fix For: 3.1.0
>
>         Attachments: validation_test.zip
>
>
> I have prepared a fairly small schema with 'xs:unique' constraint on nested element.
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
>   <xs:element name="el1">
>     <xs:element name="el2">
>       <xs:element name="el3">
>                         <xs:attribute name="key"                 type="xs:string" use="required"/>
> <!--                        <xs:attribute name="key"                 type="xs:unsignedInt" use="required"/>-->
>       </xs:element name>
>         <xs:unique name="uniqCreativeKey">
>               <xs:selector xpath="cam:creative"/>
>               <xs:field    xpath="@key"/>
>          </xs:unique>
>     </xs:element name>
>   </xs:element name>
> </xs:schema>
> I need to validate a relatively large xml (~ 100Kb) against this schema.
> Load + validation take about 10s and process consumes ~100Mb of memory. I am using SAX, not DOM.
> If I change schema definition slightly to make type of "unique" attribute to be unsignedInt instead of string - than code runs ~ 1min:10 sec (7 times slower) and also uses about 100Mb of RAM.
> Thus I have two questions:
> 1. Why parser requires 100Mb of RAM to load and validate 100Kb file 
> 2. Why changing attribute definition type from xs:string to xs:unsignedInt slows validation 7 times?
> I have built xerces myself from sources.
> ./configure --disable-shared --enable-netaccessor-socket --enable-msgloader-inmemory --prefix=/mnt/builder/3rdParty/xml/xerces-c/3.0.1 --libdir=/mnt/builder/3rdParty/xml/xerces-c/3.0.1/lib.gcc-3.4.6
> I can provide schema, xml and code that I am running.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


[jira] Updated: (XERCESC-1857) unique/key constraint on non string field/attriibute slows validation 6x times vs string attriibute

Posted by "Leonid G (JIRA)" <xe...@xml.apache.org>.
     [ https://issues.apache.org/jira/browse/XERCESC-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leonid G updated XERCESC-1857:
------------------------------

    Attachment: validation_test.zip

please see attached file.

Also I would like to mention that when I either removing 'xs:unique' from schema or turning of 'Identity Constraint Checking' with 
parser->setFeature(XERCES_CPP_NAMESPACE::XMLUni::XMLUni::fgXercesIdentityConstraintChecking, false);

execution time goes down from 10 seconds to 0.024s, it is almost 3 order of magnitude.

Also I have observed similar pattern when tried DOM instead of SAX.

> unique/key constraint on non string field/attriibute slows validation 6x times vs string attriibute
> ---------------------------------------------------------------------------------------------------
>
>                 Key: XERCESC-1857
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1857
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Validating Parser (XML Schema)
>    Affects Versions: 3.0.1
>         Environment: Linux, Red Hat Enterprise Linux ES release 4 (Nahant Update 4), gcc 3.4.6, 32 bits.
>            Reporter: Leonid G
>         Attachments: validation_test.zip
>
>
> I have prepared a fairly small schema with 'xs:unique' constraint on nested element.
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
>   <xs:element name="el1">
>     <xs:element name="el2">
>       <xs:element name="el3">
>                         <xs:attribute name="key"                 type="xs:string" use="required"/>
> <!--                        <xs:attribute name="key"                 type="xs:unsignedInt" use="required"/>-->
>       </xs:element name>
>         <xs:unique name="uniqCreativeKey">
>               <xs:selector xpath="cam:creative"/>
>               <xs:field    xpath="@key"/>
>          </xs:unique>
>     </xs:element name>
>   </xs:element name>
> </xs:schema>
> I need to validate a relatively large xml (~ 100Kb) against this schema.
> Load + validation take about 10s and process consumes ~100Mb of memory. I am using SAX, not DOM.
> If I change schema definition slightly to make type of "unique" attribute to be unsignedInt instead of string - than code runs ~ 1min:10 sec (7 times slower) and also uses about 100Mb of RAM.
> Thus I have two questions:
> 1. Why parser requires 100Mb of RAM to load and validate 100Kb file 
> 2. Why changing attribute definition type from xs:string to xs:unsignedInt slows validation 7 times?
> I have built xerces myself from sources.
> ./configure --disable-shared --enable-netaccessor-socket --enable-msgloader-inmemory --prefix=/mnt/builder/3rdParty/xml/xerces-c/3.0.1 --libdir=/mnt/builder/3rdParty/xml/xerces-c/3.0.1/lib.gcc-3.4.6
> I can provide schema, xml and code that I am running.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org


[jira] Updated: (XERCESC-1857) unique/key constraint on non string field/attriibute slows validation 6x times vs string attriibute

Posted by "Leonid G (JIRA)" <xe...@xml.apache.org>.
     [ https://issues.apache.org/jira/browse/XERCESC-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Leonid G updated XERCESC-1857:
------------------------------

    Attachment:     (was: xml_validation_test.zip)

> unique/key constraint on non string field/attriibute slows validation 6x times vs string attriibute
> ---------------------------------------------------------------------------------------------------
>
>                 Key: XERCESC-1857
>                 URL: https://issues.apache.org/jira/browse/XERCESC-1857
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: Validating Parser (XML Schema)
>    Affects Versions: 3.0.1
>         Environment: Linux, Red Hat Enterprise Linux ES release 4 (Nahant Update 4), gcc 3.4.6, 32 bits.
>            Reporter: Leonid G
>
> I have prepared a fairly small schema with 'xs:unique' constraint on nested element.
> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
>   <xs:element name="el1">
>     <xs:element name="el2">
>       <xs:element name="el3">
>                         <xs:attribute name="key"                 type="xs:string" use="required"/>
> <!--                        <xs:attribute name="key"                 type="xs:unsignedInt" use="required"/>-->
>       </xs:element name>
>         <xs:unique name="uniqCreativeKey">
>               <xs:selector xpath="cam:creative"/>
>               <xs:field    xpath="@key"/>
>          </xs:unique>
>     </xs:element name>
>   </xs:element name>
> </xs:schema>
> I need to validate a relatively large xml (~ 100Kb) against this schema.
> Load + validation take about 10s and process consumes ~100Mb of memory. I am using SAX, not DOM.
> If I change schema definition slightly to make type of "unique" attribute to be unsignedInt instead of string - than code runs ~ 1min:10 sec (7 times slower) and also uses about 100Mb of RAM.
> Thus I have two questions:
> 1. Why parser requires 100Mb of RAM to load and validate 100Kb file 
> 2. Why changing attribute definition type from xs:string to xs:unsignedInt slows validation 7 times?
> I have built xerces myself from sources.
> ./configure --disable-shared --enable-netaccessor-socket --enable-msgloader-inmemory --prefix=/mnt/builder/3rdParty/xml/xerces-c/3.0.1 --libdir=/mnt/builder/3rdParty/xml/xerces-c/3.0.1/lib.gcc-3.4.6
> I can provide schema, xml and code that I am running.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: c-dev-help@xerces.apache.org