You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Steven Murray <sm...@ebt.com> on 2001/04/23 15:16:55 UTC

FW: Entity References in Attribute values

I'm using the xerces 1.3.1 parser, with emphasis on SAX-2 parsing.  The file
I was parsing looks like this:
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE xsl:stylesheet [
   <!ENTITY copy "&#169;">
]>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ebt="http://www.ebt.com/2001/XSL/Transform">
    <xsl:variable name="ebt:company-name" select='"eBusiness
Technologies"'/>
    <xsl:variable name="ebt:copyright" select='"Copyright &copy; 1999-2001
eBT"'/>
</xsl:stylesheet>

The problem I have is the sequencing of events for the entity reference
&copy;.  When using the SAX 2 event framework and processing the
xsl:variable element named ebt:copyright. I get the event
org.xml.sax.ext.LexicalHandler.startEntity BEFORE I get the event
org.xml.sax.ContentHandler.startElement.

When your processing a DOCTYPE (DTD) you have the startDTD and endDTD
markers to provide some context for the startEntity event.  However, there
is no such marker to provide context for an Element, and thus there is no
way to detect that the entity your processing is associated with an
attribute node.  As far as my code knows, it could just as well be
processing an Entity Reference node, which it handles differently than an
entity reference in an attribute value.

Since the startElement event requires that the processing of the attributes
has been completed (it's passed as a parameter), the only good solution I
can think of is to add startAttribute and endAttribute events to the
LexicalHandler, similar to what's done with the DTD events.  I am not sure
of who controls and what the state of the SAX-2 specification is in order to
have these added to the specification or what influence the xerces team has
in this area.

I have been able to work around this problem by adding the public method
getScannerState() to the class
org.apache.xerces.framework.XMLDocumentScanner.  When the state is
SCANNER_STATE_ATTRIBUTE_VALUE I'm able to conclude that I'm processing an
Element which provides me the context I desire.  This solution is NOT
desirable in the long run as it requires me to modify and generate a new
xerces.jar file and I would rather distribute an official build of xerces
than a private one for our product offering.  However, providing a public
method to get the internal state of the XML scanner would not be a bad idea,
and would be my fallback position if a modification cannot be made to the
SAX-2 LexicalHandler interface.

I would appreciate hearing if someone had a better solution to my problem.
If not, then can the xerces team can have this modification to the SAX-2
LexicalHandler approved and implemented.  If not, then a public method to
get the XML scanner state would be greatly appreciated.

Steven L. Murray
eBusiness Technologies
smurray@ebt.com



---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-user-help@xml.apache.org