You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Andy Clark <an...@apache.org> on 2001/09/14 15:10:41 UTC

[Xerces2] Non-Normalized Values Implemented

I just committed the code to implement the non-normalized
value for internal entity declarations. I also implemented
the non-normalized value for attribute values. We had the
method in the XMLAttributes interface but it wasn't 
implemented before. (The next thing to do is remove the
various entity methods from the XMLAttributes interface.)

With these changes, you can get access more information
about how entities are used in the DTD and document. For
example:

  <!-- entities.dtd -->
  <!ENTITY % yesno 'yes|no'>
  <!ENTITY % yesnomaybe '%yesno;|maybe'>
  <!ENTITY candy 'M &amp; M'>

  <!-- document.xml -->
  <!DOCTYPE root SYSTEM 'entities.dtd'>
  <root attr='&candy;'/>

Before, you could get this information in the DTD handler 
and document handler:

  internalEntityDecl("%yesno", "yes|no")
  internalEntityDecl("%yesnomaybe", "yes|no|maybe")
  internalEntityDecl("candy", "M &amp; M")

  startElement("root", { "attr", "M & M" })

And now, you get this information:

  internalEntityDecl("%yesno", "yes|no", "yes|no")
  internalEntityDecl("%yesnomaybe", "yes|no|maybe", "%yesno;|maybe")
  internalEntityDecl("candy", "M &amp; M", "M &amp; M")

  startElement("root", { "attr", "M & M", "&candy;" })

The DTD information can be used to analyze how parameter
entities are used to define content models, etc. Pretty
cool.

I think this goes a long way to supporting the kind of
DTD information want to receive without requiring 50
billion methods to break apart each declaration. Some
examples of information we still can't convey to the
DTD handler:

  <!ENTITY % space ''>
  <!ENTITY % prefix 'a:'>
  <!ENTITY % name '%prefix;name'>
  <!ENTITY % yesno 'yes|no'>

  <!ELEMENT%space;%name;%space;(#PCDATA)>
  <!ATTLIST %name; choice (%yesno;) #REQUIRED>

You would not get any notification of the use of parameter
entities %space; and %name; within the declarations and
also would not be able to know the usage of %yesno; for
the enumerated values. 

I do not think we should try to support the first case.
However, we could support the second if we added another
similar parameter to the attributeDecl method. What do
people think?

Also, I'm wondering if we should provide the same 
information for the default attribute value in the
attribute decl. Right now the handler only sees the
normalized default attribute value.

Comments and suggestions are welcome.

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: [Xerces2] Non-Normalized Values Implemented

Posted by Andy Clark <an...@apache.org>.
I think some of my questions from my last post were lost in
the noise. Therefore, I'll restate them so that people can
comment, make suggestions, etc:

Should we modify the XMLDTDHandler to provide the following
information?

1) the non-normalized value for attribute type
2) The non-normalized value of attribute enumeration values
3) The non-normalized value for default attribute values

The idea is to emit more information about how parameter
entities are commonly used within a DTD. I'm not trying
to go "all the way" with this -- I'm just trying to find
a comfortable middle ground that is even useful for XML
editor writers.

The current attributeDecl prototype is this:

  void attributeDecl(String elementName, String attributeName,
                     String type, String[] enumeration,
                     String defaultType, XMLString defaultValue)

However, some common uses for parameter entities within
attribute declarations are listed below. I'm only talking 
about the external subset here, folks, because the internal 
subset can only contain parameter entities with WHOLE 
markup declarations in them.

1) parameter entity used for attribute type

<!ENTITY % string 'CDATA'>
<!ATTLIST elem attr %string; #REQUIRED>

2) parameter entity used for enumeration values

<!ENTITY % yesno 'yes|no'>
<!ATTLIST elem attr (%yesno;) #REQUIRED>

<!ENTITY % yesno '(yes|no)'>
<!ATTLIST elem attr %yesno; #REQUIRED>

3) parameter entity used for default value

<!ENTITY % default '"yes"'>
<!ATTLIST elem attr (yes|no) %default;>

However, adding non-normalized values for all of these
options will really blow the method prototype up and it's
already quite long to begin with. We could pass the 
non-normalized value for the entire declaration and push 
the burden to the application. But this might lead people 
to ask "why don't we do that for *all* the markup 
declarations?". It's such a nasty thought I don't even
want to think about it, though...

Comments?

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: [Xerces2] Non-Normalized Values Implemented

Posted by Andy Clark <an...@apache.org>.
I found a bug in the buffering of the non-normalized value
during scanning the attribute's value but I have fixed it.

-- 
Andy Clark * IBM, TRL - Japan * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org