You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-users@xerces.apache.org by "S, Srinivasan (Srinivasan)" <sr...@lucent.com> on 2005/10/27 16:52:54 UTC

Schema Validation Performance

Hi
    I am using Xerces to do Schema Validation. My schema contains some
complex Type objects for which 
the 'maxOccurs' field is specified as a large value ('5997'). Due to this,
Schema validation gave poor performance.
After that When I look into Xerces C++ documentation for the version
2.7(Page 243), the same scenario has been 
mentioned as a limitation of Xerces. Even the solution is given in that
document to modify the value of 'maxOccurs' 
to the 'unbounded'. But I can't put the value of maxOccurs as 'unbounded' as
I have to impose maximum 
occurrence constraints on these complex Type objects.
     Is there any other solution to improve the performance of Xerces parser
with the above scenario?

     I am also curious to know whether this limitation will be fixed in the
subsequent releases of Xerces?

     Please help me in this issue.

Thanks
Srini.

    
    

Re: Schema Validation Performance

Posted by Alberto Massari <am...@datadirect.com>.
Hi Srini,

At 20.22 27/10/2005 +0530, S, Srinivasan (Srinivasan) wrote:
>Hi
>     I am using Xerces to do Schema Validation. My schema contains some
>complex Type objects for which
>the 'maxOccurs' field is specified as a large value ('5997'). Due to this,
>Schema validation gave poor performance.
>After that When I look into Xerces C++ documentation for the version
>2.7(Page 243), the same scenario has been
>mentioned as a limitation of Xerces. Even the solution is given in that
>document to modify the value of 'maxOccurs'
>to the 'unbounded'. But I can't put the value of maxOccurs as 'unbounded' as
>I have to impose maximum
>occurrence constraints on these complex Type objects.
>      Is there any other solution to improve the performance of Xerces parser
>with the above scenario?

Unfortunately no; the algorithm used by the validator implies a table 
NxN (where N is the number of distinct nodes that could occur in the 
XML file - in your case, all the 5997 possibilities) and, in order to 
build this table, it also uses a recursive algorithm that grows the 
stack to its limit. And both operations are slow...

>      I am also curious to know whether this limitation will be fixed in the
>subsequent releases of Xerces?

There is a bug open 
(http://issues.apache.org/jira/browse/XERCESC-1051) that is assigned 
to me: I have done some explorative work, but it will take some more 
time before a solution is in place.

If you don't want to modify the schema, consider the workaround that 
you find in the above mentioned bug report, and then modify your 
application to check that the total number of objects is in the valid range.

Hope this helps,
Alberto