You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by "Michael Glavassevich (JIRA)" <xe...@xml.apache.org> on 2007/02/23 18:12:05 UTC

[jira] Created: (XERCESJ-1227) Poor performance / OutOfMemoryError for sequences, choices and nested minOccurs/maxOccurs

Poor performance / OutOfMemoryError for sequences, choices and nested minOccurs/maxOccurs
-----------------------------------------------------------------------------------------

                 Key: XERCESJ-1227
                 URL: https://issues.apache.org/jira/browse/XERCESJ-1227
             Project: Xerces2-J
          Issue Type: Bug
          Components: XML Schema Structures
    Affects Versions: 2.9.0
            Reporter: Michael Glavassevich
            Priority: Minor


We now handle large minOccurs/maxOccurs on element/wildcard particles more gracefully by creating a compact representation in the DFA and using counters to check the occurence constraints, however we will still fully expand the content model for minOccurs/maxOccurs on sequences and choices which could still lead to an OutOfMemoryError or very poor performance (i.e. could still take several minutes to build the DFA).  Sequences, choices and nested minOccurs/maxOccurs are somewhat tricker to handle. We would need a more general solution than the one implemented for elements and wildcards to improve those.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org


[jira] Commented: (XERCESJ-1227) Poor performance / OutOfMemoryError for sequences, choices and nested with large minOccurs/maxOccurs

Posted by "Georg Fischer (JIRA)" <xe...@xml.apache.org>.
    [ https://issues.apache.org/jira/browse/XERCESJ-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12519144 ] 

Georg Fischer commented on XERCESJ-1227:
----------------------------------------

With Xerces 2.9.0 I had persistent OutOfMemory problems with a ISO 20022 schema whch was modified for SEPA. Even -Xmx512m did not help.
The original schema on http://www.iso20022.org/index.cfm?item_id=60055 for oain.001.001.02
works fine. The modifed SEPA schema has the following restrictions as compared to the ISO schema:

427c427
< 			<xs:element name="CdtTrfTxInf" type="CreditTransferTransactionInformation2" maxOccurs="unbounded"/>
---
> 			<xs:element name="CdtTrfTxInf" type="CreditTransferTransactionInformation2" maxOccurs="9999999"/>
527c527
< 			<xs:element name="RfrdDocAmt" type="ReferredDocumentAmount1Choice" minOccurs="0" maxOccurs="unbounded"/>
---
> 			<xs:element name="RfrdDocAmt" type="ReferredDocumentAmount1Choice" minOccurs="0" maxOccurs="999999"/>

Instances validate with this schema in XMLSpy, but Xerces blows up.

I feel that such high upper bounds make not much sense. I would much like to be able to validate against such schemata with Xerces code. If you cannot rewrite the DFA in general, my suggestion for the Xerces code would be 
(a) to silently replace such high maxOccurs values by "unbounded" 
(b) to introduce a special feature which allows the user to control such a replacement

> Poor performance / OutOfMemoryError for sequences, choices and nested with large minOccurs/maxOccurs
> ----------------------------------------------------------------------------------------------------
>
>                 Key: XERCESJ-1227
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1227
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: XML Schema Structures
>    Affects Versions: 2.9.0
>            Reporter: Michael Glavassevich
>            Priority: Minor
>
> We now handle large minOccurs/maxOccurs on element/wildcard particles more gracefully by creating a compact representation in the DFA and using counters to check the occurence constraints, however we will still fully expand the content model for minOccurs/maxOccurs on sequences and choices which could still lead to an OutOfMemoryError or very poor performance (i.e. could still take several minutes to build the DFA).  Sequences, choices and nested minOccurs/maxOccurs are somewhat tricker to handle. We would need a more general solution than the one implemented for elements and wildcards to improve those.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org


[jira] Commented: (XERCESJ-1227) Poor performance / OutOfMemoryError for sequences, choices and nested with large minOccurs/maxOccurs

Posted by "Christopher Sahnwaldt (JIRA)" <xe...@xml.apache.org>.
    [ https://issues.apache.org/jira/browse/XERCESJ-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475866 ] 

Christopher Sahnwaldt commented on XERCESJ-1227:
------------------------------------------------

A while ago, I rewrote org.apache.xerces.impl.dtd.models.CMStateSet.java as a sparse array and achieved a speedup factor between 10 and 20 for our problem with maxOccurs="999". I hope I will find the time to clean up the code and submit a patch in the next few days.

> Poor performance / OutOfMemoryError for sequences, choices and nested with large minOccurs/maxOccurs
> ----------------------------------------------------------------------------------------------------
>
>                 Key: XERCESJ-1227
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1227
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: XML Schema Structures
>    Affects Versions: 2.9.0
>            Reporter: Michael Glavassevich
>            Priority: Minor
>
> We now handle large minOccurs/maxOccurs on element/wildcard particles more gracefully by creating a compact representation in the DFA and using counters to check the occurence constraints, however we will still fully expand the content model for minOccurs/maxOccurs on sequences and choices which could still lead to an OutOfMemoryError or very poor performance (i.e. could still take several minutes to build the DFA).  Sequences, choices and nested minOccurs/maxOccurs are somewhat tricker to handle. We would need a more general solution than the one implemented for elements and wildcards to improve those.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org


[jira] Updated: (XERCESJ-1227) Poor performance / OutOfMemoryError for sequences, choices and nested with large minOccurs/maxOccurs

Posted by "Michael Glavassevich (JIRA)" <xe...@xml.apache.org>.
     [ https://issues.apache.org/jira/browse/XERCESJ-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Glavassevich updated XERCESJ-1227:
------------------------------------------

    Summary: Poor performance / OutOfMemoryError for sequences, choices and nested with large minOccurs/maxOccurs  (was: Poor performance / OutOfMemoryError for sequences, choices and nested minOccurs/maxOccurs)

> Poor performance / OutOfMemoryError for sequences, choices and nested with large minOccurs/maxOccurs
> ----------------------------------------------------------------------------------------------------
>
>                 Key: XERCESJ-1227
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1227
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: XML Schema Structures
>    Affects Versions: 2.9.0
>            Reporter: Michael Glavassevich
>            Priority: Minor
>
> We now handle large minOccurs/maxOccurs on element/wildcard particles more gracefully by creating a compact representation in the DFA and using counters to check the occurence constraints, however we will still fully expand the content model for minOccurs/maxOccurs on sequences and choices which could still lead to an OutOfMemoryError or very poor performance (i.e. could still take several minutes to build the DFA).  Sequences, choices and nested minOccurs/maxOccurs are somewhat tricker to handle. We would need a more general solution than the one implemented for elements and wildcards to improve those.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org


[jira] Commented: (XERCESJ-1227) Poor performance / OutOfMemoryError for sequences, choices and nested with large minOccurs/maxOccurs

Posted by "Michael Glavassevich (JIRA)" <xe...@xml.apache.org>.
    [ https://issues.apache.org/jira/browse/XERCESJ-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12519183 ] 

Michael Glavassevich commented on XERCESJ-1227:
-----------------------------------------------

Georg, doesn't seem like you noticed what this bug report is for.  It's a catch all for a number of more complex / less common cases involving minOccurs/maxOccurs.  An improvement for large minOccurs/maxOccurs on elements and wildcards has already been implemented and will be in the next release.  That would seem to cover the fragment of your schema which you've posted here. 

> Poor performance / OutOfMemoryError for sequences, choices and nested with large minOccurs/maxOccurs
> ----------------------------------------------------------------------------------------------------
>
>                 Key: XERCESJ-1227
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1227
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: XML Schema Structures
>    Affects Versions: 2.9.0
>            Reporter: Michael Glavassevich
>            Priority: Minor
>
> We now handle large minOccurs/maxOccurs on element/wildcard particles more gracefully by creating a compact representation in the DFA and using counters to check the occurence constraints, however we will still fully expand the content model for minOccurs/maxOccurs on sequences and choices which could still lead to an OutOfMemoryError or very poor performance (i.e. could still take several minutes to build the DFA).  Sequences, choices and nested minOccurs/maxOccurs are somewhat tricker to handle. We would need a more general solution than the one implemented for elements and wildcards to improve those.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org


[jira] Commented: (XERCESJ-1227) Poor performance / OutOfMemoryError for sequences, choices and nested with large minOccurs/maxOccurs

Posted by "Michael Glavassevich (JIRA)" <xe...@xml.apache.org>.
    [ https://issues.apache.org/jira/browse/XERCESJ-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12478557 ] 

Michael Glavassevich commented on XERCESJ-1227:
-----------------------------------------------

Chris, you should try out the current code in SVN.  For a number of common cases it will now handle large minOccurs/maxOccurs in constant space and time.

> Poor performance / OutOfMemoryError for sequences, choices and nested with large minOccurs/maxOccurs
> ----------------------------------------------------------------------------------------------------
>
>                 Key: XERCESJ-1227
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1227
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: XML Schema Structures
>    Affects Versions: 2.9.0
>            Reporter: Michael Glavassevich
>            Priority: Minor
>
> We now handle large minOccurs/maxOccurs on element/wildcard particles more gracefully by creating a compact representation in the DFA and using counters to check the occurence constraints, however we will still fully expand the content model for minOccurs/maxOccurs on sequences and choices which could still lead to an OutOfMemoryError or very poor performance (i.e. could still take several minutes to build the DFA).  Sequences, choices and nested minOccurs/maxOccurs are somewhat tricker to handle. We would need a more general solution than the one implemented for elements and wildcards to improve those.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org


[jira] Issue Comment Edited: (XERCESJ-1227) Poor performance / OutOfMemoryError for sequences, choices and nested with large minOccurs/maxOccurs

Posted by "Georg Fischer (JIRA)" <xe...@xml.apache.org>.
    [ https://issues.apache.org/jira/browse/XERCESJ-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12519144 ] 

gfis edited comment on XERCESJ-1227 at 8/10/07 3:25 PM:
-----------------------------------------------------------------

With Xerces 2.9.0 I had persistent OutOfMemory problems with an ISO 20022 schema whch was modified for SEPA. Even -Xmx512m did not help.
The original schema on http://www.iso20022.org/index.cfm?item_id=60055 for pain.001.001.02
works fine. The modifed SEPA schema has the following restrictions as compared to the ISO schema:

427c427
< 			<xs:element name="CdtTrfTxInf" type="CreditTransferTransactionInformation2" maxOccurs="unbounded"/>
---
> 			<xs:element name="CdtTrfTxInf" type="CreditTransferTransactionInformation2" maxOccurs="9999999"/>
527c527
< 			<xs:element name="RfrdDocAmt" type="ReferredDocumentAmount1Choice" minOccurs="0" maxOccurs="unbounded"/>
---
> 			<xs:element name="RfrdDocAmt" type="ReferredDocumentAmount1Choice" minOccurs="0" maxOccurs="999999"/>

Instances validate with this schema in XMLSpy, but Xerces blows up.

I feel that such high upper bounds make not much sense. I would much like to be able to validate against such schemata with Xerces code. If you cannot rewrite the DFA in general, my suggestion for the Xerces code would be 
(a) to silently replace such high maxOccurs values by "unbounded" 
(b) to introduce a special feature which allows the user to control such a replacement

      was (Author: gfis):
    With Xerces 2.9.0 I had persistent OutOfMemory problems with a ISO 20022 schema whch was modified for SEPA. Even -Xmx512m did not help.
The original schema on http://www.iso20022.org/index.cfm?item_id=60055 for oain.001.001.02
works fine. The modifed SEPA schema has the following restrictions as compared to the ISO schema:

427c427
< 			<xs:element name="CdtTrfTxInf" type="CreditTransferTransactionInformation2" maxOccurs="unbounded"/>
---
> 			<xs:element name="CdtTrfTxInf" type="CreditTransferTransactionInformation2" maxOccurs="9999999"/>
527c527
< 			<xs:element name="RfrdDocAmt" type="ReferredDocumentAmount1Choice" minOccurs="0" maxOccurs="unbounded"/>
---
> 			<xs:element name="RfrdDocAmt" type="ReferredDocumentAmount1Choice" minOccurs="0" maxOccurs="999999"/>

Instances validate with this schema in XMLSpy, but Xerces blows up.

I feel that such high upper bounds make not much sense. I would much like to be able to validate against such schemata with Xerces code. If you cannot rewrite the DFA in general, my suggestion for the Xerces code would be 
(a) to silently replace such high maxOccurs values by "unbounded" 
(b) to introduce a special feature which allows the user to control such a replacement
  
> Poor performance / OutOfMemoryError for sequences, choices and nested with large minOccurs/maxOccurs
> ----------------------------------------------------------------------------------------------------
>
>                 Key: XERCESJ-1227
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1227
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: XML Schema Structures
>    Affects Versions: 2.9.0
>            Reporter: Michael Glavassevich
>            Priority: Minor
>
> We now handle large minOccurs/maxOccurs on element/wildcard particles more gracefully by creating a compact representation in the DFA and using counters to check the occurence constraints, however we will still fully expand the content model for minOccurs/maxOccurs on sequences and choices which could still lead to an OutOfMemoryError or very poor performance (i.e. could still take several minutes to build the DFA).  Sequences, choices and nested minOccurs/maxOccurs are somewhat tricker to handle. We would need a more general solution than the one implemented for elements and wildcards to improve those.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org