You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by "Mukul Gandhi (Jira)" <xe...@xml.apache.org> on 2022/05/25 11:43:00 UTC

[jira] [Commented] (XERCESJ-1745) Save/Restore serialized "compiled" parser-validator

    [ https://issues.apache.org/jira/browse/XERCESJ-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17542006#comment-17542006 ] 

Mukul Gandhi commented on XERCESJ-1745:
---------------------------------------

It seems that, there's already quite a bit of work done within Xerces related to grammar caching and preparsing. Please see, [https://xerces.apache.org/xerces2-j/faq-grammars.html|https://xerces.apache.org/xerces2-j/faq-grammars.html].

I doubt, that serializing Xerces's XSModel instance to file and deserializing it would make the XML schema validation much faster. I guess, serialized file representation of XSModel, may be large (for the XML schema's whose in-memory content model state machines are large), which will not serve the purpose stated within this feature request.

Moreover, changing many XML Schema related Xerces data structures to implement java.io.Serializable may be a very complicated and extensive change, and may risk stability of the Xerces implementation.

> Save/Restore serialized "compiled" parser-validator
> ---------------------------------------------------
>
>                 Key: XERCESJ-1745
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1745
>             Project: Xerces2-J
>          Issue Type: New Feature
>          Components: Other, Serialization
>    Affects Versions: 2.12.2
>            Reporter: Mike Beckerle
>            Priority: Major
>
> Feature requested by Apache Daffodil project PMC.
>  
> We use Xerces-J to validate XML files. 
>  
> The schemas of these files are huge. Think 300+ fairly large XSD files all included/imported together. Megabytes of XSD. 
>  
> In order to validate+parse faster, we know Xerces does something akin to "compiling" the XSD into lower-level data structures. 
>  
> The requested feature is to make this "compilation" step of the large XSD schema explicit, and then be able to serialize the resulting java object to a file. Subsequently one can reload this pre-compiled object so as not to face this compiling overhead at startup time.
>  
> An API call to explicitly force this compilation step, so that the time taken to do it can be measured, is an important part of this feature. This compilation can also occur automatically on first use, without requiring an explicit "compile it now" API call, and that would retain perfect compatiblity with Xerces APIs today. 
>  
>  But for very large XSD, it is of value to be able to time this compile activity, so a  new API method to cause Xerces to do this compilation step explicitly (and which is separate from the serialization of the resulting object) is of value. 
>  
> In summary I think numerous internal data structures within Xerces would have to be made Serializable, and a compileParser(), saveParser(java.io.OutputStream) and restoreParser(java.io.InputStream) or something along those lines are needed. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org