You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by "Mukul Gandhi (Jira)" <xe...@xml.apache.org> on 2021/10/06 12:48:00 UTC

[jira] [Comment Edited] (XERCESJ-1227) Poor performance / OutOfMemoryError for sequences, choices and nested with large minOccurs/maxOccurs

    [ https://issues.apache.org/jira/browse/XERCESJ-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17424959#comment-17424959 ] 

Mukul Gandhi edited comment on XERCESJ-1227 at 10/6/21, 12:47 PM:
------------------------------------------------------------------

Here's looks like some kind of workaround (with XML Schema 1.1, using XercesJ 2.12.1) to some of these performance issues.

Consider the following XSD documents,

1)

<?xml version="1.0"?>
 <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

      <xs:element name="X">
           <xs:complexType>
                 <xs:choice maxOccurs="5000">
                      <xs:element name="a" type="xs:string"/>
                      <xs:element name="b" type="xs:string"/>
                </xs:choice>
          </xs:complexType>
       </xs:element>

</xs:schema>

2)

<?xml version="1.0"?>
 <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

       <xs:element name="X">
            <xs:complexType>
                <xs:choice maxOccurs="unbounded">
                     <xs:element name="a" type="xs:string"/>
                     <xs:element name="b" type="xs:string"/>
               </xs:choice>
               <xs:assert test="count(*) le 5000"/>
           </xs:complexType>
        </xs:element>

</xs:schema>

Following is an XML instance document,

<?xml version="1.0"?>
 <X>
     <a>hello</a>
     <a>hello</a>
     <a>hello</a>
     <a>hello</a>
     <a>hello</a>
     <a>hello</a>
     <a>hello</a>
     <a>hello</a>
     <a>hello</a>
     <a>hello</a>
     <a>hello</a>
     <a>hello</a>
     <a>hello</a>
     <a>hello</a>
     <a>hello</a>
     <a>hello</a>
     <a>hello</a>
     <a>hello</a>
     <a>hello</a>
     <a>hello</a>
     <a>hello</a>
     <a>hello</a>
 </X>

For me with examples above, the provided XML instance document, when validated by schema document 1), took 14865 ms. With schema document 2), the validation took 42 ms. I've used, XercesJ sample jaxp.SourceValidator to do the XSD validation. The performance difference looks nice to me.


was (Author: mukul_gandhi):
Here's looks like some kind of workaround (with XML Schema 1.1, using XercesJ 2.12.1) to some of these performance issues.

Consider the following XSD documents,

1)

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

      <xs:element name="X">
          <xs:complexType>
                <xs:choice maxOccurs="5000">
                     <xs:element name="a" type="xs:string"/>
                     <xs:element name="b" type="xs:string"/>
               </xs:choice>
         </xs:complexType>
      </xs:element>

</xs:schema>

2)

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

       <xs:element name="X">
           <xs:complexType>
               <xs:choice maxOccurs="unbounded">
                    <xs:element name="a" type="xs:string"/>
                    <xs:element name="b" type="xs:string"/>
              </xs:choice>
              <xs:assert test="count(*) le 5000"/>
          </xs:complexType>
       </xs:element>

</xs:schema>

Following is an XML instance document,

<?xml version="1.0"?>
<X>
    <a>hello</a>
    <a>hello</a>
    <a>hello</a>
    <a>hello</a>
    <a>hello</a>
    <a>hello</a>
    <a>hello</a>
    <a>hello</a>
    <a>hello</a>
    <a>hello</a>
    <a>hello</a>
    <a>hello</a>
    <a>hello</a>
    <a>hello</a>
    <a>hello</a>
    <a>hello</a>
    <a>hello</a>
    <a>hello</a>
    <a>hello</a>
    <a>hello</a>
    <a>hello</a>
    <a>hello</a>
</X>

For me with examples above, the provided XML instance document, when validated by schema document 1), took 14865 ms. With schema document 2), the validation took 42 ms. The performance difference looks nice to me.

> Poor performance / OutOfMemoryError for sequences, choices and nested with large minOccurs/maxOccurs
> ----------------------------------------------------------------------------------------------------
>
>                 Key: XERCESJ-1227
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1227
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: XML Schema 1.0 Structures, XML Schema 1.1 Structures
>    Affects Versions: 2.9.0
>            Reporter: Michael Glavassevich
>            Priority: Minor
>              Labels: gsoc, gsoc2014, mentor
>
> We now handle large minOccurs/maxOccurs on element/wildcard particles more gracefully by creating a compact representation in the DFA and using counters to check the occurence constraints, however we will still fully expand the content model for minOccurs/maxOccurs on sequences and choices which could still lead to an OutOfMemoryError or very poor performance (i.e. could still take several minutes to build the DFA).  Sequences, choices and nested minOccurs/maxOccurs are somewhat tricker to handle. We would need a more general solution than the one implemented for elements and wildcards to improve those.
> With the introduction of XML Schema 1.1 support we would also need to consider how to improve this for the enhanced xs:all model groups.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org