You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-dev@xerces.apache.org by ne...@ca.ibm.com on 2002/02/06 00:52:27 UTC

grammar caching requirements

Hi folks,

Now that we've got a framework for grammar caching, it seem slike a
pretty good time to start a discussion of what our default
implementation should look like.  We'll also need to make sure that all
the components we want to use the grammar caching framework actually
do; I'm thinking especially of the DTDValidator here, since we
currently have no way of caching DTD grammars.
Since we've got a couple of fairly significant bugs that we know of in
Xerces 2.0.0, it would be nice to put out a refresh fairly soon.  So
I'm hoping that we can keep both our grammar-caching requirements--and
our discussion of them :-)--as brief as possible.

As a starting-point, here's my idea of the functionality a default
grammar caching implementation should have:

1.  It should be thread-safe;
2.  It should be as easy to use from SAX, DOM or XNI ior even JAXP as possible;
3.  It should encompass both XML Schemas and DTD's;
4.  It should permit grammars to be preparsed or cached as they are
    encountered while validating instance documents;
5.  It should permit the application to "lock" the cache, that is,
    prevent any more grammars from being added.

Now obviously, if a user is interested only in 2., then 4. and 5. don't
apply (since there's no concept of grammar preparsing in SAX, XNI or
JAXP).  So we won't be able to satisfy everyone all the time--but then
nothing new there.  :-)

So here's the sort of implementation I have in mind:

- We need an XMLGrammarConfiguration which subclasses
  StandardParserConfiguration.  This class has:
    - a static XMLGrammarPoolImpl and a SymbolTable
    - a no-arg constructor which passes these into the
      StandardParserConfiguration that it extends
    - a method (similar to the DOM 3
      DOMASBuilder#parseASURI(...)
      (except perhaps with clearer semantics)
    - a method to stop XMLGrammarPoolImpl from receiving any more
      grammars

Under this regime, it'll be possible to access Xerces2's grammar
caching functionality even through JAXP, using Andy's
configuration-selection logic (this is why we need a no-arg
constructor on this class).  If one does this, every time a SAXParser
or DOMParser is manufactured, it will share the same Grammar cache as
all the others.

On the other hand, users who want a bit of added functionality can
instantiate this configuration directly.  This way they'll be able to
access the preparsing and locking functionality--although this will
require using a Xerces-specific implementation, thus stepping away
from standard API's.  Such a user would also be able to extend this
configuration, perhaps by having a non-threadsafe implementation of
XMLGrammarPool or other custom enhancements.

I envisage XMLGrammarPoolImpl as being a very simple collection of a
couple of hashtables for DTD's and schemas.  They could probably hash
directly on the XMLGrammarDescriptions for these types of grammars.

I suspect it's not realistic to permit DTD preparsing yet.  But I do
think it should be relatively straightforward to induce the
XMLDTDValidator to co-operate with our grammar caching scheme--I'd
certainly invite feedback on this point though.

I looked over the CachingParserPool that we currently have (has anyone
used it?) but I can't see any way of adapting this sort of approach to
suit users who don't want to bind themselves closely to our
implementation.  Thoughts?

Have I missed anything?  Do folks think this will work?  Even more
important, since I won't be around all that much this month--I've got
some vacation time coming up and some other commitments--anyone want
to help?

Any suggestions on how the work might be broken down?

Cheers,
Neil

Neil Graham
XML Parser Development
IBM Toronto Lab
Phone:  905-413-3519, T/L 969-3519
E-mail:  neilg@ca.ibm.com


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

Re: grammar caching requirements

Posted by Andy Clark <an...@apache.org>.

neilg@ca.ibm.com wrote:
> Now that we've got a framework for grammar caching, it seem slike a
> pretty good time to start a discussion of what our default
> implementation should look like.  We'll also need to make sure that all

Since we're starting the discussion about implementing the
generic grammar caching mechanism, I went back to the grammar
caching framework that we argued so much over. And I would 
like to quickly revisit this subject before continuing.

The fundamental question is how grammars are to be identified
by validation components and the grammar pool. Currently, we
have a grammar type such as "DTD" and "XSD"; and we have a
description defined by the XMLGrammarDescription interface.
Do we need both?

An application may preload the cache with DTD and/or XML
Schema grammars. The parser, when it parses the document,
then requests a grammar by specifying a grammar description.
For a DTD, this would be the rootElementName, publicId, and
systemId specified in the DOCTYPE declaration, for example.

So I think we *should* keep this information linked: grammars
and grammar descriptions. Perhaps something like the following:

  public interface Grammar {
    public XMLGrammarDescription getGrammarDescription();
  }

  public interface XMLGrammarDescription
    extends XMLResourceIdentifier {
    public String getGrammarType();
  }

The XMLGrammarPool interface would stay the same.

I suggest this change for two reasons: 1) having methods to 
identify the grammar type on *both* the grammar and grammar 
description interfaces seems superfluous; and 2) I think
that it would be simpler in the end to keep the grammar
description information with the grammar.

If a grammar doesn't keep a copy of its associated grammar
description then each grammar pool instance needs to have
all of the logic to determine if a requested grammar
description properly identifies a registered grammar. 

Does this make sense?

> 3.  It should encompass both XML Schemas and DTD's;

And more...

> 4.  It should permit grammars to be preparsed or cached as they are
>     encountered while validating instance documents;
> 5.  It should permit the application to "lock" the cache, that is,
>     prevent any more grammars from being added.

And we need to be able to allow a DTD grammar to a) be 
used in the case where the document contains no DOCTYPE 
line and b) override the grammar specified in the DOCTYPE
declaration.

> I suspect it's not realistic to permit DTD preparsing yet.  But I do

It's not the DTD preparsing that I think is difficult. I'm
more concerned about how to handle internal subsets with a
cached grammar.

> I looked over the CachingParserPool that we currently have (has anyone
> used it?) but I can't see any way of adapting this sort of approach to

The caching parser pool was never used (and could not be
used) because we didn't have a grammar caching facility.
But it was written as a placeholder with an idea towards
the future where we could support grammar caching.

-- 
Andy Clark * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org