You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-dev@xerces.apache.org by Mark Swinkels <Ma...@Attachmate.com> on 2000/10/20 19:03:43 UTC

Xerces-2 design question - server usage

It seems to me that there are some uses of global variables and 'instance'
data in the current Xerces that will limit the ability to use Xerces in a
server environment where there may be many instances of the parser running
at once within one VIM.

I think that these areas need to be carefully looked at to make sure that
the design isn't limiting the applicability of Xerces.

The instances I've found are mostly in the Validator but may affect other
areas of Xerces as well.

1) The ID, IDREF and ENTITY datatype validators all store information
locally to the validator. This makes it impossible to cache grammars
containing these elements. There is a state object that gets passed into the
validation call that could be used to hold this information on a per parse
basis.

2) The datatype registry is currently a singleton, meaning that different
parser instance can get affect each other in unexpected ways. Handling this
registry correctly in the face of cached grammars and the like is tricky and
will require some thought. I think that in the end the registry will be have
to be attached either to individual grammars or to the grammar cache. The
registry should also probably be handled more like symbol tables in a
compiler so that the registry for the built in types can be shared among
grammar instances.

I see two levels of context needed for the Xerces. One is per parse context
which holds information about the current parse. The other is a place to
keep things like the grammar cache which persist from parser instance to
parser instance. I think things like the entity resolver also belong in the
later context because of the way they interact with the grammar cache.

-- Mark

Re: Xerces-2 design question - server usage

Posted by Andy Clark <an...@apache.org>.

Mark,

You make a lot of very good points but it's important to know
what codebase you are looking at. Are you looking at the code
for validation in Xerces 1.x or the stuff that has been moved
over to Xerces2?

We will make sure to avoid any singleton type of objects that 
could cause problems for caching. Since the validation code is,
for now, being directly moved from Xerces 1.x to Xerces2, the
same types of problems that you mention could appear in the new
codebase. We will be reviewing the validation implementation to
ensure that everything is OK. 

If you'd like to help with this, contact Jeffrey Rodriguez 
(jeffreyr@us.ibm.com) or Eric Ye (ericye@apache.org). We should
probably re-evaluate the Schema implementation, as well, to try
to simplify/clean the code and see how we can avoid using the
DOM for the implementation. Contributions in this area are
gladly accepted! :)

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org