You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by ne...@ca.ibm.com on 2002/06/26 22:37:51 UTC
Xerces and static mutability
Hi folks,
For me, one of the neatest things about working on Xerces is the
opportunity to learn about the plethora of products for which Xerces is a
base technology. Sitting as it does at pretty much the lowest level of XML
processing, a Xerces developer gets to find out about the needs of all
kinds of different products that need to interact with XML.
As of J2SE 1.4, one type of product that needs to understand XML is a JVM.
In fact, since SAX, DOM and JAXP are now core specifications in J2SE/EE,
any implementation of these specifications needs to have an XML parser
right at its core.
And Xerces is--at least for some JDK implementors--the parser of choice!
We were already shipped in IBM's JDK 1.3.0; we'll be there in 1.4 as well.
And that, in itself, seems to me to be fairly neat.
But, with this popularity, goes this multitude of needs I mentioned before.
For instance, certain IBM JDK's (and I'm betting IBM won't be the only
implementor to offer choices like this) are "reusable". For more
information on IBM's version of this kind of JDK, you could look here
http://www-1.ibm.com/servers/eserver/zseries/software/java/pdf/jtc0a100.pdf
(if you don't mind PDF!)
As a brief summary, what this means is that the same JVM can be used by
successive applications. Basically, between application sessions, the JVM
gets reinitialized or reset.
But being reset doesn't affect classes that lie at the heart of this kind
of JVM. And XML parsing has moved so far down in the application stack
that the XML parser doesn't get reset between application sessions.
So, when Xerces is used in this kind of JVM, values that are static in
Xerces will carry forward from one application into another. Therefore, if
an application is able to modify any of our static values in such a way
that xerces's behaviour will be altered, we have a problem--the next
application might not work and the JVM is effectively made non-reusable
(read broken). Since it's not possible for a user to know how an an
application she wants to use works, we need to make sure there aren't any
ways for this kind of problem to arise because of some interaction that an
application has with Xerces.
In practice, it doesn't look like meeting this requirement will cause much
disruption. We'll have to make many static variables throughout our code
private--or at least package-protected. Sometimes, we'll have to change
accessor methods to return clones of static objects instead of the actual
objects themselves. But, applications which don't use Xerces
internals--and applications shouldn't be using Xerces internals because
they could change at any moment for all kinds of other reasons--shouldn't
be affected.
So I'll be checking in changes over the next couple of weeks to make Xerces
"statically immutable" as the terminology goes. As I do this, I'll try to
(1) not affect any externally visible, and especially any externally useful
behaviour; (2) not impact Xerces's performance; (3) keep Xerces as
extensible as possible. Sandy's already been helping me with some of these
issues, so we'll have a second mind watching this stuff.
Here's a partial list of things that will need to be fixed. If anyone
thinks one of these changes might break them, this would be a great time to
speak up!
CoreDocumentImpl#kidOK: make private final.
Many classes: make RECOGNIZED_FEATURES and RECOGNIZED_PROPERTIES private
final, and return clones of these objects in the appropriate getter
methods. We only use these when building configuration pipelines
internally at the moment, so this shouldn't be a measurable performance
problem.
XSAttributeChecker: make ATTIDX_COUNT private
make a few static members private final and rework some others so
they're not exposed to the outside world.
XMLChar: make CHARS byte array private.
Version: Made fVersion final.
Base64: made base64Alphabet and lookUpBase64Alphabet final
Hexbin: made hexNumberTable and lookUpHexAlphabet final
UCSReader: made UCS(2|4)(B|L)E final
ExceptionMessages, ImplementationMessages, DatatypeMessages: we can
probably live without these files entirely since they aren't
ref'd anywhere.
XPath$XPathScanner: made fASCIICharMap final
XPath$Tokens: made fgTokenNames private instead of public
ParserForXMLSchema: changed range and ranges to private
REUtil: changes regexCache to be final it is never changed but
it never hurts to be sure...
RegularExpression: changed declaration of wordchar so that it is
local to the method in which it is initialized and used.
Token: changed blockNames to private from package-protected
changed categories and categories2 from package protected to
private final
changed categoryNames to private
token_0to9: package protected
token_not_0to9: package protected
token_not_wordchars: package protected
token_wordchars: package protected
token_not_wordedge: package protected
token_wordedge: package protected
token_dot: package protected
token_not_spaces: package protected
token_spaces: package protected
token_empty: package protected
token_linebeginning: package protected
token_linebeginning2: package protected
token_wordbeginning: package protected
token_string_beginning: package protected
token_lineend: package protected
token_wordend: package protected
token_stringend: package protected
token_stringend2: package protected
getCombiningCharacterSequence: protected->package protected
getGraphemePattern: protected->package protected
SchemaSymbols: make fSchemaSymbols in class and in the inner class
private.
IDValue: made VS private.
XSDComplexTypeTraverser: removed fErrorContent; now it's created each
time getErrorContent is called, but that's only during error
conditions so this loses little.
Changed to make work with a modified restricted XSComplexTypeDecl.
HTMLSerializer: made _xhtml nonstatic and made XHTMLNamespace final
(this should be correctly initialized however).
SchemaGrammar: Modify so no static members remain visible.
XSComplexTypeDecl: In order that xsd:anyType can be of this type,
we'll have to rearrange this a bit.
XSSimpleTypeDecl: fix so that objects of this kind know whether
they're schema primitive types (therefore are static), and
therefore shouldn't be modified.
SchemaDVFactory: this is the hardest one; have to make setFactory make
sense in the context of static immutability.
Hope that makes sense. As always, thoughts much appreciated--especially if
they touch on how SchemaDVFactory, SchemaGrammar et al can be modified so
as not to hurt performance, retain functionality and yet achieve the
objective of making them statically immutable.
Cheers!
Neil
Neil Graham
XML Parser Development
IBM Toronto Lab
Phone: 905-413-3519, T/L 969-3519
E-mail: neilg@ca.ibm.com
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
Re: Xerces and static mutability
Posted by Andy Clark <an...@apache.org>.
neilg@ca.ibm.com wrote:
> Version: Made fVersion final.
Here is one case (there may be others) where we CANNOT
make this field "final". This field was final to begin
with but then there was a problem reported by users --
IBMers, in fact.
Here's the deal: the users in question have code that
relies on the version information found in the Version
class. So they have code something like this:
if (Version.fVersion.equals("2.0.1")) { ... }
However, if the fVersion field is made final, the Java
compiler adds the constant directly to the compiled
class. Therefore, the comparison is always made against
the constant value at the time of compilation and not
the value in the Version class class loaded at runtime.
See the problem?
The workaround, at that time, was for us to make this
field non-final by default. For backwards compatibility,
it should probably stay this way. In hindsight, though,
it probably would've been better to turn this into a
method that people called.
> XMLChar: make CHARS byte array private.
This was one where I wanted it to be publicly accessible
so that people didn't have to call methods for checking
the status of a character. But Java doesn't have a way
of making array contents read-only and the current state
of JVM technology is such that this is problem not an
issue.
--
Andy Clark * andyc@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org