You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@xalan.apache.org by bu...@apache.org on 2002/10/23 20:11:08 UTC

DO NOT REPLY [Bug 13897] New: - Reuse parser and cache XML schema in XalanC

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=13897>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=13897

Reuse parser and cache XML schema in XalanC

           Summary: Reuse parser and cache XML schema in XalanC
           Product: XalanC
           Version: 1.4.x
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: Enhancement
          Priority: Other
         Component: XalanC
        AssignedTo: xalan-dev@xml.apache.org
        ReportedBy: thomas.cherel@ascentialsoftware.com


It would be nice to expose in XalanC the latest Xerces features to cache 
analyzed schema to be reused accross multiple parsing/validation.
It would also mean the reuse of the same parser instance for multiple XSLT 
processing in XalanC and even within specific XSLT function such as the 
document() one.

Here is a short version of an email exchange in the mailing list describing 
the issue with more details as well as providing a "workaround" to do it.

-----Original Message-----
From: David N Bertoni/Cambridge/IBM [mailto:david_n_bertoni@us.ibm.com] 
Sent: Tuesday, October 22, 2002 4:35 PM
To: xalan-c-users@xml.apache.org
Subject: RE: Schema validation performance

Hi Thomas,

You can use Xerces to parse a document without switching to the internal
interfaces.  Here's some pseudo-code, which I haven't tested, but which
should give you an idea of what you need to do:

void
parse(
      const InputSource&                 theInputSource,
      XalanCompiledStylesheet*  theStylesheet,
      const XSLTResultTarget&    theResultTarget)
{
   SAX2XMLReader* const   theReader =  XMLReaderFactory::createXMLReader();

   XalanTransformer   theTransformer;

   XalanDocumentBuilder* const   theBuilder =
   theTransformer.createDocumentBuilder();

   theReader->setContentHandler(theBuilder.getContentHandler());
   theReader->setLexicalHandler(theBuilder.getLexicalHandler());
   theReader->setDTDHandler(theBuilder.getDTDHandler());

   const XalanDOMString
   reuseGrammar("http://apache.org/xml/features/validation/reuse-grammar");
   const XalanDOMString
   namespacePrefixes("http://xml.org/sax/features/namespace-prefixes");

   theReader->setFeature(reuseGrammar.c_str(), true);
   theReader->setFeature(namespacePrefixes.c_str(), true);

   theReader->parse(theInputSource)

   delete theReader;

   theTransformer.transform(*theBuilder, theStylesheet, theResultTarget); }

Of course, since I'm not really re-using the parser, it doesn't used the
cached grammar, but it gives you an idea of how you can do this.  The only
drawback is that document brought into the transformation through the
document() function will not use this parser instance, and so will not use
the cached grammar.

Dave

-----Original Message-----
From: Thomas Cherel

Until it gets added to Xalan, is there any way I can use the Xerces
interface directly? For example, today, I can provide to Xalan an already
parsed document (a DOM tree). Can I use the new Xerces API to generate such
a DOM tree (and reuse schema/grammar for the validation that will be done at
that time), and then pass it to Xalan (that will take care of the XSLT
processing only)?


Thomas


-----Original Message-----
From: David N Bertoni/Cambridge/IBM [mailto:david_n_bertoni@us.ibm.com]
Sent: Tuesday, October 22, 2002 1:19 PM
To: xalan-c-users@xml.apache.org
Subject: Re: Schema validation performance


Hi Thomas,


With the latest Xerces, you can prime a parser instance with a particular
schema, then have it re-use that schema over and over again.  You can also
have it re-use a grammar for every document it parses.  However, these
interfaces are new and still experimental, so I don't have much experience
using them.


We don't expose lots of the Xerces parser interfaces because it gets very
burdensome to do so.  However, this one is probably worth doing, so you
might want to enter a Bugzilla request for an enhancement.


Dave

-----Original Message-----
From: Thomas Cherel

When processing an XML document (applying a style sheet), I can turn on the
validation of the XML document against its schema. Is there any way (or may
be this is already done under the cover) to cache the XML schema for
validation of other XML documents?

What I mean is that if I process a bunch of XML documents in sequence, and
all of them are using the same XML schema, it will be nice if the schema is
downloaded and analyzed only once instead of for each document.