You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by ne...@ca.ibm.com on 2002/06/26 22:37:51 UTC

Xerces and static mutability

Hi folks,

For me, one of the neatest things about working on Xerces is the
opportunity to learn about the plethora of products for which Xerces is a
base technology.  Sitting as it does at pretty much the lowest level of XML
processing, a Xerces developer gets to find out about the needs of all
kinds of different products that need to interact with XML.

As of J2SE 1.4, one type of product that needs to understand XML is a JVM.
In fact, since SAX, DOM and JAXP are now core specifications in J2SE/EE,
any implementation of these specifications needs to have an XML parser
right at its core.

And Xerces is--at least for some JDK implementors--the parser of choice!
We were already shipped in IBM's JDK 1.3.0; we'll be there in 1.4 as well.
And that, in itself, seems to me to be fairly neat.

But, with this popularity, goes this multitude of needs I mentioned before.
For instance, certain IBM JDK's (and I'm betting IBM won't be the only
implementor to offer choices like this) are "reusable". For more
information on IBM's version of this kind of JDK, you could look here

http://www-1.ibm.com/servers/eserver/zseries/software/java/pdf/jtc0a100.pdf

(if you don't mind PDF!)

As a brief summary, what this means is that the same JVM can be used by
successive applications.  Basically, between application sessions, the JVM
gets reinitialized or reset.
But being reset doesn't affect classes that lie at the heart of this kind
of JVM.  And XML parsing has moved so far down in the application stack
that the XML parser doesn't get reset between application sessions.

So, when Xerces is used in this kind of JVM, values that are static in
Xerces will carry forward from one application into another.  Therefore, if
an application is able to modify any of our static values in such a way
that xerces's behaviour will be altered, we have a problem--the next
application might not work and the JVM is effectively made non-reusable
(read broken).  Since it's not possible for a user to know how an an
application she wants to use works, we need to make sure there aren't any
ways for this kind of problem to arise because of some interaction that an
application has with Xerces.

In practice, it doesn't look like meeting this requirement will cause much
disruption.  We'll have to make many static variables throughout our code
private--or at least package-protected.  Sometimes, we'll have to change
accessor methods to return clones of static objects instead of the actual
objects themselves.  But, applications which don't use Xerces
internals--and applications shouldn't be using Xerces internals because
they could change at any moment for all kinds of other reasons--shouldn't
be affected.

So I'll be checking in changes over the next couple of weeks to make Xerces
"statically immutable" as the terminology goes.  As I do this, I'll try to
(1) not affect any externally visible, and especially any externally useful
behaviour; (2) not impact Xerces's performance; (3) keep Xerces as
extensible as possible.  Sandy's already been helping me with some of these
issues, so we'll have a second mind watching this stuff.

Here's a partial list of things that will need to be fixed.  If anyone
thinks one of these changes might break them, this would be a great time to
speak up!

CoreDocumentImpl#kidOK:  make private final.

Many classes:  make RECOGNIZED_FEATURES and RECOGNIZED_PROPERTIES private
final, and return clones of these objects in the appropriate getter
methods.  We only use these when building configuration pipelines
internally at the moment, so this shouldn't be a measurable performance
problem.

XSAttributeChecker:  make ATTIDX_COUNT private
    make a few static members private final and rework some others so
    they're not exposed to the outside world.

XMLChar:  make CHARS byte array private.

Version:  Made fVersion final.

Base64:  made base64Alphabet and lookUpBase64Alphabet final

Hexbin:  made hexNumberTable and lookUpHexAlphabet final

UCSReader:  made UCS(2|4)(B|L)E final

ExceptionMessages, ImplementationMessages, DatatypeMessages:  we can
     probably live without these files entirely since they aren't
ref'd anywhere.

XPath$XPathScanner:  made fASCIICharMap final

XPath$Tokens:  made fgTokenNames private instead of public

ParserForXMLSchema:  changed range and ranges to private

REUtil:  changes regexCache to be final it is never changed but
    it never hurts to be sure...

RegularExpression:  changed declaration of wordchar so that it is
    local to the method in which it is initialized and used.

Token:  changed blockNames to private from package-protected
    changed categories and categories2 from package protected to
    private final
    changed categoryNames to private
    token_0to9:  package protected
    token_not_0to9:  package protected
   token_not_wordchars:  package protected
   token_wordchars:  package protected
   token_not_wordedge:  package protected
   token_wordedge:  package protected
   token_dot:  package protected
    token_not_spaces:  package protected
    token_spaces:  package protected
    token_empty:  package protected
    token_linebeginning:  package protected
    token_linebeginning2:  package protected
    token_wordbeginning:  package protected
    token_string_beginning:  package protected
    token_lineend:  package protected
    token_wordend:  package protected
    token_stringend:  package protected
    token_stringend2:  package protected
    getCombiningCharacterSequence:  protected->package protected
    getGraphemePattern:  protected->package protected

SchemaSymbols:  make fSchemaSymbols in class and in the inner class
    private.

IDValue:  made VS private.

XSDComplexTypeTraverser:  removed fErrorContent; now it's created each
    time getErrorContent is called, but that's only during error
    conditions so this loses little.
    Changed to make work with a modified restricted XSComplexTypeDecl.

HTMLSerializer:  made _xhtml nonstatic and made XHTMLNamespace final
    (this should be correctly initialized however).

SchemaGrammar:  Modify so no static members remain visible.

XSComplexTypeDecl:  In order that xsd:anyType can be of this type,
     we'll have to rearrange this a bit.

XSSimpleTypeDecl:  fix so that objects of this kind know whether
     they're schema primitive types (therefore are static), and
     therefore shouldn't be modified.

SchemaDVFactory:  this is the hardest one; have to make setFactory make
     sense in the context of static immutability.

Hope that makes sense.  As always, thoughts much appreciated--especially if
they touch on how SchemaDVFactory, SchemaGrammar et al can be modified so
as not to hurt performance, retain functionality and yet achieve the
objective of making them statically immutable.

Cheers!
Neil
Neil Graham
XML Parser Development
IBM Toronto Lab
Phone:  905-413-3519, T/L 969-3519
E-mail:  neilg@ca.ibm.com



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: Xerces and static mutability

Posted by Andy Clark <an...@apache.org>.
neilg@ca.ibm.com wrote:
> Version:  Made fVersion final.

Here is one case (there may be others) where we CANNOT
make this field "final". This field was final to begin
with but then there was a problem reported by users --
IBMers, in fact.

Here's the deal: the users in question have code that
relies on the version information found in the Version
class. So they have code something like this:

   if (Version.fVersion.equals("2.0.1")) { ... }

However, if the fVersion field is made final, the Java
compiler adds the constant directly to the compiled
class. Therefore, the comparison is always made against
the constant value at the time of compilation and not
the value in the Version class class loaded at runtime.
See the problem?

The workaround, at that time, was for us to make this
field non-final by default. For backwards compatibility,
it should probably stay this way. In hindsight, though,
it probably would've been better to turn this into a
method that people called.

> XMLChar:  make CHARS byte array private.

This was one where I wanted it to be publicly accessible
so that people didn't have to call methods for checking
the status of a character. But Java doesn't have a way
of making array contents read-only and the current state
of JVM technology is such that this is problem not an
issue.

-- 
Andy Clark * andyc@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org